AD-A070  307 


UNCLASSIFIED 


PRINCETON  UN IV  N J DEPT  OF  STATISTICS  F/G  6/16 

AN  EMPIRICAL  HIGHER-RANK  ANALYSIS  MODEL  OF  THE  AGE  DISTRIBUTION— ETC (U) 
MAY  78  M B BRECKENRIDGE’  J V TUKEY  DAAG29-76-G-0298 


TR-143-SER-2 


ARO— 14244.6— M 


UaciLs^  i f ied  


SECURITY  CLASSIFICATION  OF  This  PAGE  D.l.  F.nl»f»rf) 


REPORT  DOCUMENTATION  PAG 


A TITLE  Ip<  Subllll.) 

AN  EMPIRICAL  tllGHER-AANK  ANALYSIS  f^ODEL  OF 
THE  AGE  DISTRIBUTION  OF  FERTILITY 


T.  AUTHOR!.) 


READ  INSTRUCTIONS 
BEi'  ORE  COMPLETING  FORM 


S RECIPIENT'S  CATALOG  NUM.I  H 


S TYPE  OF  REPORT  A PERIOD  COVE  RE  O 

, Technical 


PERE ORPINS  ORG  REPORT  Nu 


( CONTRACT  OR  GP 


Mary  B ./ b recken  r i dge^  J / > \//  JTvLttzy[  / / ^ DAAG29-  76- G 0298r 


performing  ORGANIZATION  name  AND  address 

Princeton  University  / 
Department  of  Statistics 
Princeton.  New  Jersey  08S4D  _ 


CONTROLLING  OFFICE  NAME  AND  ADDRESS 

U.  S.  Army  Research  Office 
F.  0.  Box  12211 

Research  Triangle  Park,  ..'C  21 


10  PROGRAM  ELEMENT  PROJECT.  TASK 
AREA  a WORK  UNIT  NUMBERS 


NUMBER  OF  PAGES 
1 08 


is  SECURITY  CLASS,  fot  Ihim  report) 


IS*  OECL  ASSIFIC  ATION/ DOWNGRADING 
SCwtOULE 


IT  DISTRIBUTION  STATEMENT  (ol  th • mbitrmct  on f# red  In  Block  20.  II  different  Irotn  Report 


,!•  SUPPLEMENT  ARY  NOTES 

The  view,  jpinions,  and/or  findings  contained  in  this  report  are  those  of  the 
author(s)  and  should  net  be  construed  as  an  official  Department  of  the  Army 
position,  polic;.  , or  decision,  uniess  sc  designated  by  other  documentation. 


If  KEY  WORDS  (Centmue  on  rtrfra*  »i <#•  • ! nee eeeary  *nd  Identity  fev  block  number i 


time  series  model 
exploratory  data  analysis 
unifying  patterns 
distributions  of  fertility 


demographic  data 
ferti 1 i ty  models 


ie- 

spec i f 

1 c OV' 

1 i 

ty  seq 

uence 

th 

sine 

ross- 

'_9 

ui di ng 

and 

h 

robust 

/res  i 

ie 

di vers 

e di  s 

s . 

i n ac 

hi  evi 

HO.  ABSTRACT  CONTINUED 


the  fitted  descriptions,  is  emphasized.  Techniques,  new  in  their  detailed 
application  to  demographic  data,  are  described  in  some  detail. 


Unclassi f 


1 


Thi*  document  has  been  approved 

for  pubiic  rc!v*'.~^  ond  sclc:  iis 
dislribvl'n  is  u. •:*od. 


AN  EMPIRICAL  HIGHER-RANK  ANALYSIS  MODEL 


OF  THE  AGE  DISTRIBUTION  OF  FERTILITY 


by 

Mary  B.  Breckenridge 
Department  of  Statistics 
Princeton  University 


*>> . . 


with 

THE  RELATIONSHIP  OF  EMPIRICAL  ANALYSIS 
TO  MORE  NARROWLY  MODELLED  ANALYSIS 

by 

John  W.  Tukey 
Princeton  University  and 
Bell  Laboratories 


t 

Technical  Report  No.  143,  Series  2 ^ 
Department  of  Statistics 
Princeton  University 
May  1978 

Research  for  this  report  was  supported  in  part 
through  a contract  with  the  U.  S.  Army  Research 
. Office,  No.  DAAG29-76-G-0298,  awarded  to  the 

Department  of  Statistics,  Princeton  University, 
j Princeton,  New  Jersey. 

0 


F 


Abstract 


Empirical  higher-rank  (EHR)  analysis  of  a 185- single-year 
sequence  of  Swedish  age- specific  overall  fertility  rates,  and  of  a 
related  68- single-year  marital  fertility  sequence,  develops  a time 
series  model  of  the  changing  age  distribution  of  births  in  cross- 
sectional  and  cohort  perspectives.  The  procedure  combines  the 
data-guiding  and  flexibility  of  Tukey's  exploratory  data  analysis 
(EDA)  approach  with  robust /resistant  methods  of  estimation  to 
detect  unifying  patterns  underlying  the  diverse  distributions  of 
fertility.  The  centrality  of  examination  of  residuals,  in  achieving 
optimal  fits  to  the  data,  in  detecting  singular  departures  from  fit, 
and  in  interpretation  of  the  fitted  descriptions,  is  emphasized. 

Techniques,  new  in  their  detailed  application  to  demographic  data, 
are  described  in  some  detail. 

The  close  fits  obtained  for  all  sequences,  coupled  with  guided 
choice  of  a standard  form  in  which  to  use  the  fitted  descriptions, 
reduce  the  variability  in  the  fertility  data  to  a concise  and  coherent 
demographic  picture  which  differs  in  important  ways  from  the 
descriptions  other  aggregate  fertility  models  have  provided,  while 
having  some  significant  relations  to  other  models.  Ways  of  refining 
still  further  the  EHR-fitted  descriptions  to  provide  additional  insight 
into  the  underlying  structure  of  aggregate  fertility  distributions  are  * 

suggested.  Use  of  this  analytic  approach  to  identify  and  deal  with 
errors  in  such  data  is  discussed. 


*•  ! 

0 


■ 


TABLE  OF  CONTENTS 

Abstract  i 

Li  st  of  Table  s iv 

List  of  Figures  vi 

Introduction  I 

A.  Preliminary  considerations  3 

Al.  Features  of  EHR  analysis  which  may  make  it  particularly 

appropriate  to  the  task  of  describing  fertility  distributions  3 

A2.  Focus  of  this  EHR  analysis  of  fertility  5 

A3.  Choice  of  data  6 

R.  Procedure  8 

B1  . Preparation  of  data  8 

B2 . EHR  analysis  9 

C.  How  well  has  the  EHR  analysis  developed  descriptions  of  the 

Swedish  fertility  time  sequences?  12 

Cl  . Size  of  residuals  1 3 

C2.  Structure  of  residuals  14 

Ch  3Vra\ s of  looking  at  residuals  , 19 

C4 . Conclusions  about  residuals  2 5 

D.  Re-presentation  of  fits  in  a standard  form  26 

E.  Relations  between  the  age  distribution  of  births  and  the 

components  of  a standard  form  EHR  fit  : 2b 

1' . Some  demographic  implications  of  the  fertility  components 

derived  by  EHR  analysis  (examination  of  the  1892-1959  data)  , 3 3 

FI.  EHR  components  of  marital  and  overall  fertility  histories  . 34 

F2.  Comparison  of  the  EHR  and  Coale  descriptions  of  marital 

fertility  3? 

F 3 . Changes  in  the  marital  fertility  distribution  not  accounted 

for  by  changes  in  level  of  marital  fertility  41 

F4.  Changes  in  the  overall  distribution  not  accounted  for  by 
changes  in  the  marital  distribution  of  functioning  activity 
states  46 

F6.  Level -compensated  distributions  of  functioning  activity 

states  in  overall  and  marital  perspectives  47 

F6.  Use  of  standard-level-compensated  distributions  of  births 
to  approximate  age-specific  proportions  of  married  plus 
cohabiting  active  women  SI 

ii 


G.  Changes  across  time  in  cohort  and  cross-sectional  overall 

fertility  patterns  (selected  observations  on  the  full  1775-1959 
fertility  histories)  56 

G1  . Comparability  of  cohort  and  cross-sectional  EHR 

components  56 

G2.  Some  intersections  of  cohort  and  cross-sectional  age 

distributions  of  functioning  activity  states,  viewed  through 
EHR  components  58 

G3.  Cohort  vs.  cross-sectional  evidence  for  data  points  of 

lesser  accuracy  63 

H.  Conclusions  65 

Footnotes  68 

Figures  79 

Appendix  A.  A robust/resistant  procedure  for  the  iterative  fitting  of 

two  multiplicative  com(xinents  to  an  M x N matrix  A1 

Appendix  B.  A selected  standard  form  re-presentation  of  a rank- 

two  fit  B1 

Appendix  C.  Age  distributions  of  natural  fertility,  reported  and 

approximated  by  EHR  parameters  Cl 

Appendix  D.  "The  Relationship  of  Empirical  Analysis  to  More 

Narrowly  Modelled  Analysis"  by  John  W.  Tukey  D1 


r 


Table  1 


Table  2. 


Table  3. 


Table  4. 


Table  5. 


T able  6 . 


Table  7. 


Table  8. 


Table  0. 


T able  10. 


LIST  OF  TABLES 


Page 

Features  of  Residuals  from  EHR  Fitting  of  F. . = 

a A + B B to  the  Age  Distribution  of  Fertility, 
i J 1 j 

Expressed  on  the  Folded  Square  Root  Scale,  in  Selected 
Time  Sequences:  1775-1959-  15 

Reported  and  EHR  Fitted  Fertility  Distributions  on 

Re-expressed  and  Raw  Fraction  Scales:  Selected 

Sequences  and  Selected  Years,  1775-1959.  16,17,  18 


Basis  of  Coding  of  Raw  Fraction  Scale  Residuals  in 
Time  Sequence  Plots  (Figs.  5,  6 and  71. 

Ranee  of  Contribution  of  Components  a. A.  and  ft.  B.  to 

i .1  ' .1 

the  Fitted  Description  of  the  XI  5 Sequence,  Folded 
Square  Root  Scale. 


Age-Specific  Fertility  Distributions  (raw  scale)  Based  on 

the  Ace  Parameters  A.  and  B.  for  X 1 ^ and  Increments 

.1  .1 

in  the  Time  Parameters  a . and  . 


Fertility  Distributions  Constructed  from  the  Components 

a A"  and  S B for  Selected  Marital  and  Overall  Fertility 
i j i .1 

Sequences,  1892-1959. 

Regression  Coefficients  and  Constants  in  Linear  Compen- 
sation of  EHR  Time  Parameters  for  the  Level  of  Fertility 
and  for  Other  Selected  F'acto  rs. 

EHR  Standard  Distributions,  EHR  Level-Compensated 
Distributions  and  ' Natural"  Distributions  of  Marital 
Fertility,  Selected  Examples. 

EHR  Level-Compensated  Distributions  of  Cross-Sectional 
Overall  Fertility,  Selected  Years. 

Range  of  Contribution  of  Components  a.  A.  and  /3.  B.  to 

the  Fitted  Description  of  the  Cl  5 Sequence,  Folded  Square 
Root  Scale. 


IV 


23 


30 


30 


36 


43 


50 


62 


59 


t 


Table  1 1 . 


Table  B1  . 


Table  BZ. 


Table  B3. 


Table  Cl  . 


Age-Specific  Fertility  Distributions  (raw  scale)  Based 


on  the  Age  Parameters  A.  and  B for  Cl  5 and  Increments 

J . J 

in  the  Time  Parameters  a and  S '.  59 

i l 


APPENDIX 


Fitted  Age  Parameters  (folded  square  root  scalel 
Derived  by  EHR  Analysis  of  the  Age  Distribution  of 
Overall  and  Marital  Fertility  in  Selected  Time 
Sequences,  1775-19(59. 


B2 


Results  of  Regressions  to  Fix  EHR-Fitted  Fertility 
Distribution  Parameters  for  Re-presentation  of  Fits 
in  a Standard  Form. 


B 3 


Fitted  Age  Parameters  (folded  square  root  scale) 
After  Standard  Form  Re-presentation  of  EHR  Fits  to 
the  Age  Distributions  of  Overall  and  Marital  Fertility 
in  Selected  Time  Sequences,  1775-1959. 


B4 


The  Age  Distributions  of  Natural  Fertility,  Reported  and 
Fitted  as  Weighted  Sums  of  the  A.  and  B.  Derived  by 
EHR  Analysis  of  the  Swedish  Age  20-49  Marital  Fertility 


Time  Sequence  for  1892-1959. 


Cl 


LIST  OF  FIGURES 


Fig.  1. 


Fig.  2. 


Fig.  3. 


Fig.  4. 


Fig.  5. 


Fig.  6 


Fig.  7 


Fig.  8 


Fig.  9 


Page 

Time  sequence  plot  of  residuals  by  age  cut  from  EHR  fitting 

(with  c =6  in  the  bi weight)  of  f..  = a.A.+  6.B.  to  the  cross- 

1 J i .1  i .1 

sectional  age  20-49  overall  fertility  sequence,  1775-1959 

(with  data  expressed  on  the  raw  fraction  scale).  79 

Schematic  plots  of  residuals  by  age  cut  (folded  square  root 

scale)  from  EHR  fitting  of  F.  . = a . A .+  B.  B . to  the  cohort  age 

U 1 J 1 .1 

15-49  overall  fertility  sequence,  1775-1929.  80 

Schematic  plots  of  residuals  by  age  cut  (folded  square  root 

scale)  from  EHR  fitting  of  F.  . = oc . A .+  8.  B . to  the  cross - 

LI  i J i J 

sectional  age  15-49  overall  fertility  sequence,  1775-1959.  81 

Scatter  plots  of  residuals  for  pairs  of  age  cuts  (folded 

square  root  scale)  from  EHR  fitting  of  F.  . - 0£ . A .4  8.  B . to 

ij  i J i J 

the  cross-sectional  age  15-49  overall  fertility  sequence, 

1775-1959.  82 

Time  sequence  plot  of  coded  residuals  by  age  cut  (raw 
fraction  scale)  from  EHR  fitting  of  Fy=ajAj+/3^B^  to  the 
cross-sectional  age  15-49  overall  fertility  sequence, 

1775-1959.  83 

Time  sequence  plot  of  coded  residuals  by  age  cut  (raw 

fraction  scale)  from  EHR  fitting  of  F.  . = a . A .+ j3.  B . to  the 

i.l  i .1  i .1 

cohort  age  15-49  overall  fertility  sequence.  1775-1929.  84 

Time  sequence  plot  of  coded  residuals  by  age  cut  (raw 

fraction  scale)  from  EHR  fitting  of  F.  . O'.A.+S.B.  to  the 

ij  i .1  i J 

cross-sectional  age  15-49  marital  fertility  sequence, 

1892-1959.  85 

EHR  standard  form  time  parameters  a.  and  B.  , and  total 

i i 

rate  of  fertility,  for  cross-sectional  age  15-49  and  age 

20-49  marital  fertility  sequences,  1892-1959.  86 

EHR  standard  form  time  parameters  a.  and  B ■ • and  total 

rate  of  fertility,  for  cross-sectional  age  15-49  and  age 

20-49  overall  fertility  sequences,  1892-1959.  87 


vi 


Maritfl 


Fig.  10.  EHR  model  a,.  Coale  model  expressions  of  the  degree 
of  skewness  of  the  cross-sectional  marital  fertility 
distributions,  1892-1959. 

Fig.  11.  EHR  standard  form  marital  fertility  time  parameters  a. 

and  pi , linearly  compensated  for  total  rate  of  marital 

fertility  (cross-sectional  age  15-49  and  age  20-49  marital 
fertility  sequences,  1892-19591. 

Fig.  12.  EHR  standard  form  overall  fertility  time  parameters  a. 

and  P.  (cross-sectional  age  15-49  and  age  20-49  overall 

fertility  sequences,  1892-1959)  linearly  compensated  for 
the  corresponding  EHR  standard  form  marital  fertility 
time  parameters  (cross-sectional  age  15-49  and  age 
20-49  marital  fertility  sequences,  1892-1959). 

Age-specific  proportions  of  women  reported  married, 
and  age- specific  proportions  of  married  plus  cohabiting 
active  women  estimated  from  EHR-derived  level- 
compensated  distributions  of  overall  and  marital  fertility 
and  the  total  rates  of  overall  and  marital  fertility, 

1892-1959. 

EHR  standard  form  time  parameters  a.  and  , and 

total  rate  of  fertility,  for  cross-sectional  age  15-49  and 
age  20-49  overall  fertility  sequences,  1775-1959. 

EHR  standard  form  time  parameters  a.  and  J3.  , and 

total  rate  of  fertility,  for  cohort  age  15-49  and  age  20-49 
overall  fertility  sequences,  1775-1929. 

Some  relations  between  cohort  and  cross-sectional  fertility 
distributions,  demonstrated  with  EHR  standard  form  time 


Fig.  13. 


Fig.  14. 


Fig.  15. 


Fig.  16 . 


88 


89 


90 


91 


92 


93 


Fig.  17. 


a and  P for  cohort  and  cross-sectional  age 
i i 


parameters 
15-49  sequences,  1775-1959. 


A.  Cohort  parameters  centered  on  year  at  age  30-54. 

B.  Cohort  parameters  centered  on  year  at  age  42-46. 

EHR  standard  form  time  parameter  p.  before  and  after 
linear  compensation  for  the  total  rate  of  fertility,  cohort 
age  15-49  overall  fertility  sequence,  1775-1929. 


94 


95 


vu 


AN  EMPIRICAL  HIGHER-RANK  ANALYSIS  MODEL  OF  THE 


AGE  DISTRIBUTION  OF  FERTILITY* 


Introduction 

In  efforts  to  understand  the  historical  decline  of  fertility  in  the  now 
industrialized  countries  and  the  potential  for  fertility  decline  in  the  less 
developed  countries,  a variety  of  ingenious  measures  of  fertility  have 
been  constructed  from  the  available  data.  Such  measures  simplify  com- 
parisons across  time  and  place,  and  they  can  sharpen  the  exploration  of 
associations  between  fertility  change  and  social  and  economic  change. 

The  value  of  multiple  approaches  applied  to  data  from  diverse  sources 

z 

has  been  repeatedly  demonstrated.  The  problems  of  dealing  with  incom- 
plete and  error-ridden  data  continue  to  be  a challenge  for  new  methodology. 

While  age-marriage  duration- specific  and  age  - par  ity  - spec  ific  fer- 
tility schedules  are  highly  desirable  for  analysis  of  fertility  change,  data 
needed  to  calculate  such  schedules  have  not  been  available  for  most  time 
periods  and  populations.  The  relative  abundance  of  simple  age-specific 
fertility  schedules  (both  current  and  for  earlier  periods,  and  usually  by 


2 


five-year  age  groups!  has  encouraged  efforts  to  use,  instead,  this  type 
of  data  to  construct  models  containing  a small  number  of  meaningful 
parameters  for  comparison.  Two  types  of  such  models,  both  based  on 
pre  - selected  patterns,  have  been  prominent.  The  first  of  these  seeks  to 
express  the  net  maternity  function  in  terms  of  specific  functional  forms. 
The  second  relates  the  age  distribution  of  fertility  to  selected  demographic 
patterns,  usually  for  an  idealized  population  with  specified  constraints, 
such  as  the  absence  of  illegitimate  births  and  the  absence  of  widowhood 
and  divorce  in  the  childbearing  years.  One  of  the  most  prominent  demo- 
graphic models  has  been  that  of  Coale  which  then  became  the  basis  of  the 
Coale - Truss  ell  model  fertility  schedules.4  This  model  has  had  wide 

application  and  is  clearly  superior  in  its  descriptive  capacity  to  any  of  the 

, 5 

single  function  models  so  far  proposed. 

The  present  study  of  the  age  distribution  of  fertility  reports  an 
approach  complementary  to  that  of  Coale  and  Trussell.  It  begins  without 
a priori  assumptions  about  the  exact  shape  or  the  causes  of  unifying 
patterns  which  may  underlie  diverse  age  distributions  of  births  in  an  actual 
population  rather  than  an  idealized  one.  Using  time  histories  of  single- 
year Swedish  overall  and  marital  fertility  in  both  cross-sectional  and 
cohort  perspectives,  it  adopts  empirical  higher  rank  (F.HR)  analysis, 
including  a particular  type  of  re-expression  of  the  fractional  fertility  up 
to  some  cut-off  age,  and  shows 

how  well  this  method  of  analysis  develops  a description  of  the 
Swedish  data; 


Mm  ■ EMI 


. how  well  the  EHR  results,  though  quite  empirical,  fit  into  a 
demographic  model; 

. some  ways  in  which  the  EHR  approach  accommodates  aspects  of 
fertility  distributions  that  have  been  problematic  for  other  models 
when  dealing  with  actual  populations. 

EHR  analysis,  as  initially  developed  in  McNeil  and  Tukey,  combines 
the  emphasis  on  data-guiding  and  flexibility,  typical  of  Tukey's 
exploratory  data  analysis  (EDA)  approach,'  with 
. the  avoidance  of  trouble  from  exotic  values,  and  the  relatively 
high  efficiency  under  any  of  a wide  variety  of  circumstances, 
typical  of  robust / re s i stant  methods  of  estimation. 

The  work  reported  here  could  be  the  first  steps  in  a complete  EDA  of  the 
diverse  age  distributions  of  fertility  found  in  time  series  and  across 
populations . 

A . Preliminary  considera_Uons 

Al.  Features  of  EHR  analysis  which  may  make  it  particularly  appro- 
priate  to  the  task  of  describing  fertility  distributions. 

EHR  analysis  begins  without  detailed  assumptions  about  the  distribu- 
tion of  the  data  (in  this  case,  widely  varying  age  patterns  of  births, 
reflecting  both  biological  and  social  factorsi.  Its  approach  is  exploratory, 
guided  at  each  step  by  what  is  left,  the  residuals,  after  some  additive  or 
multiplicative  factor  has  been  removed  from  one  dimension  or  another  of 
the  data  or  of  the  residuals  from  the  preceding  step  of  the  analysis.  This 
iterative  identification  of  regularities  in  the  data  (here,  in  each  of  the 


4 


two-way  tables  of  fertility  distributions  in  time  sequence)  is  continued  until 
the  amount  of  additional  regularity  removed  in  a step  seems  negligible. 

In  the  effort  to  identify  as  fully  as  possible  the  patterns  which  account 
for  the  variability  in  the  data,  EHR  analysis  allows  for  the  fact  that  some 
data  points  may  be  far  outside  the  "true"  underlying  pattern  in  the  set,  be- 
cause of  inaccuracies  or  singular  circumstances.  Use  of  a form  of  robust/ 
resistant  estimation  (RRE)  in  the  iterative  fitting  procedure  gives  cell-wise 
weights  to  the  residuals  at  each  iteration  before  searching  for  further  adjust- 
ments to  the  fit.  The  choice  of  weight  function  is  important  if  one  's  simul- 
taneously to  ensure  resistance  to  "outliers"  and  also  avoid  giving  undue 
weight  to  small  residuals.  At  the  present  time,  one  weight  function  of  choice 
appears  to  be  the  bisquare  function  of  the  residuals.  The  iterative  fitting 
procedure  using  this  "biweight"  is  described  in  Appendix  A. 

Re-expression  of  data  to  improve  linearity  before  analysis  (a  prac- 
tice already  well  demonstrated  as  valuable  in  demographic  research)  is  a 
usual  preliminary  to  EHR  analysis.  Without  altering  the  order  relations 
of  the  members  of  the  data  set,  this  takes  advantage  of  the  greater  ease 
of  examining  linear  (here,  additive-multiplicative)  relations  and  departures 
from  them.  Whether  log,  reciprocal,  power,  or  more  complex  re- 
expression has  been  used  to  simplify  the  data's  behavior,  de-transformation 
can  readily  return  results  to  the  original,  raw,  'scale. 

Because  data  are  almost  always  the  result  of  indirect  or  imperfect 
measurement,  residuals  of  varying  sizes  are  expected  in  the  analysis. 

A distinctive  feature  of  the  EDA  approach  is  that  nothing  is  discarded. 

The  EHR  analysis  does  not  stop  with  a model  and  a statement  of  percent 


5 


of  the  total  variation  in  the  data  explained  by  the  model.  Instead,  the 
residuals  are  not  only  examined  for  further  pattern  at  each  stage  of  the 
fitting  procedure  but  are  also  retained  and  examined  in  detail  at  the  end, 
to  see  where  and  in  what  way  the  data  depart  from  the  fitted  description 
(e.g.,  in  specific  years?  at  specific  age  cuts?).  This  examinat ion  directs 
both  interpretation  and  further  exploratory  analysis.  It  often  enhances 
identification  of  major  transitions  in  underlying  pattern.  It  may  add  to 
understanding  of  the  effects  of  singular  events.  (For  example,  within  the 
context  of  a population's  trend  in  age  distribution  of  fertility,  what  effects 
has  a war  or  a period  of  increased  emigration  had  on  childbearing 
patterns?  ) Examination  of  the  residuals  may  also  aid  in  identifying  depar- 
tures which  are  due  to  errors,  and  then  in  estimating  appropriate  correc- 
tions. In  a complete  EDA,  such  a process  may  need  repetition,  particularly 
after  "fine  tuning"  the  expression  of  the  data. 

A2.  Focus  of  this  EHR  analysis  of  fertility 

The  biological  and  social  factors  acting  on  fertility  may  affect  its  age 
distribution,  its  level,  or  both.  Consistent  with  the  current  body  of  demo- 
graphic work  on  fertility  models,  we  will  focus  first  on  the  age  distribution 
--and  specifically  on  the  proportion  of  all  births,  for  a year  or  a cohort, 
which  is  attributable  to  women  a given  age  and  younger.  The  advantages 
of  using  the  cumulative  distribution,  rather  than  the  frequency  distribution, 
include : 

. the  greater  ease  of  fitting  the  class  of  distributions  which  differ 


in  location  and  scale; 


. the  fact  that  the  nature  of  any  systematic  bias  in  the  data  will 
be  more  apparent  in  a cumulative  distribution  at  the  same  time 
that  the  influence  of  singular  departures  will  be  diminished. 

We  shall  speak  throughout  of  "age  cuts"  of  this  cumulative  distribution, 
for  example  "cut  at  24/25"  to  indicate  a separation  of  the  proportion  of 
births  to  women  aged  24  and  below  from  that  to  women  aged  2 5 and  above. 
In  the  final  sections,  the  change  in  overall  level  of  fertility  will  be  related 
to  the  parameters  of  the  fitted  fertility  time  sequences. 

A 3 . Choice  of  data 

A time  series  with  diverse  fertility  distributions  allows  examination 

not  only  of  the  diversity  but  also  of  the  dynamics  of  change.  Choice  of 

* 

the  Swedish  series  was  based  on: 

9 

. the  length  of  the  series  of  single-year  schedules  available; 

, i n 

the  high  quality  of  the  data; 

. the  striking  changes  in  both  the  age  pattern  of  marriage  and 
the  level  of  fertility,  which  should  be  reflected  in  the  age 
distribution  of  births; 

. the  occurrence  of  singular  events  (wars,  periods  of  high 
emigration,  periodic  severe  crop  failures)  which  might  be 
expected  to  affect  temporarily  the  level  and/or  the  age 
distribution  of  births;  and 

. the  rich  detail  in  the  recorded  marriage  and  fertility  data  to  aid 
examination  of  demographic  significance  of  the  model  parameters 
To  test  the  use  of  EHR  analysis,  1775-1959  overall  fertility  and  1892-1959 


marital  fertility  were  selected. 


1 3 

Of  the  demographic  factors  which  affect  the  age  distribution  of  births, 
some,  in  addition  to  parity-dependent  limitation  and  spacing  of  births, 
are  prominent  for  the  Swedish  data  selected.  A late  marriage  pattern 
prevailed,  particularly  from  the  second  quarter  of  the  19th  century  until 
1 935.^  The  proportion  married  at  all  childbearing  ages  above  19  then 
increased  dramatically  within  IS  years  (for  example,  from  0.  17  to  0.40 
for  those  aged  20-24,  and  from  0.48  to  0.72  for  those  aged  25- 2 9 1 . 

The  proportion  aged  15-19  married  had  nevertheless  increased  only  from 
0.02  to  0.  OS  by  1959.  With  significant  adoption  of  parity-dependent  lim- 
itation of  births,  the  pattern  of  marital  fertility  decline  with  age  (both  in 
cohort  and  in  cross-sectional  perspectives  I therefore  reflects  two  influences 

. the  tendency  for  later-marrying  women  to  have  low-order  births 
well  beyond  age  2S,  and 

. the  tendency  for  earlier-marrying  women  to  restrict  their 

lb 

childbearing  to  younger  ages. 

New  prominence  of  divorce  and  remarriage  in  the  final  years  of  the  1775- 
1959  histories  may  also  have  influenced  the  aggregate  fertility  patterns. 

The  traditionally  large  proportion  of  first  births  premaritally  con- 
ceived is  of  particular  importance  for  the  age  distribution  of  marital 
fertility.'^  Entry  into  marriage  becomes  a better  proxy  for  entry  into 
childbearing  than  is  possible  with  high  incidence  of  post-marital  delay  of 
a first  birth  (due  either  to  contraception  or  subfecundity).  To  the  extent 
to  which  marriage  customs  select  for  those  of  proven  fecundity,  however. 


not  only  at  ages  1“>- 19  but  also  at  ages  20-24  in  this  late -mar rying  popula- 
tion, the  age  distribution  of  marital  fertility  shows  some  distortion  toward 
the  lower  ages.  This  may  go  far  in  explaining  the  historical  absence  in 

Sweden  of  a "natural"  marital  fertility  pattern  (as  this  has  been  identified 

18 

and  defined  by  Henry  1 even  considering  only  births  to  women  age  20  and 
above.  Illegitimate  fertility  has  also  been  non-negligible  for  the  data  in 
the  present  study.  In  each  of  the  years  from  1892  to  1959,  between  9%  and 
1 5CTV  of  the  births  contributing  to  the  age  pattern  and  total  rate  of  overall 
fertility  are  not  included  in  the  age  pattern  and  total  rate  of  marital  fertility. 

B . Procedure 

Bl  . Preparation  of  data 

Yearly  age-specific  overall  fertility  rates  were  calculated  for  the 

1 9 , 

years  1775-1959  from  recorded  confinements  by  age  of  mother  in  five- 
year  age  groups,  15-19  to  45-49,  and  the  female  population  in  each  of 
these  age  groups  as  this  is  either  recorded  in  the  census  or  recorded  as  an 
intercensal  estimate.21  These  schedules  were  considered  in  cross-sectional 
perspective  in  the  185-year  sequence.  They  were  also  considered  separately 
for  155  complete  cohorts,  using  the  customary  procedure  of  identifying  a 
group  of  women  by  the  year  in  which  they  were  aged  15-19,  1775-1929,  and 
following  their  childbearing  by  five-year  age  groups  until  they  reached  age 
45-49,  1805-1959. 21 

Each  schedule  was  first  cumulated  to  give  the  fertility  rates  for 
women  a given  age  and  younger,  with  age  cuts  at  19/20,  24/25... 49/50. 

Each  schedule  was  then  normalized,  i.e.  , divided  by  the  rate  for  women 


at  age  cut  49/50.  to  give  the  proportion  of  the  year's  (or  cohort's)  births 

22 

achieved  by  a given  age.  Cross-sectional  and  cohort  time  sequences  of 

overall  fertility  for  ages  20-49  alone  were  also  prepared  for  analysis  in 

the  same  way.  For  brevity,  these  four  sets  of  data  will  be  referred  to  as: 

XI  5 and  X20  (cross-sectional,  age  15-49  and  age  20-49,  overalll;  Cl  5 and 

C20  (cohort,  age  15-49  and  age  20-49.  overall). 

For  comparative  analysis  of  overall  and  marital  fertility  distributions, 

four  time  sequences  were  used:  cross-sectional  marital  fertility  for  ages 

15-49  and  ages  20-49  for  the  years  1892-1959:  and  the  corresponding 

overall  fertility  sequences  for  these  68  years.  These  data  sets  will  be 

25, 24 

referred  to  as  MX15,  MX20,  XXl  5 and  XX20. 

The  several  sequences  truncated  to  age  20-49  were  included  for  two 
reasons : 

. to  enhance  comparisons  with  those  studies  and  models  which 
exclude  age  15-19  fertility,  directly,  or  indirectly  through  the 
choice  of  external  standards; 

. to  test  the  relative  capacity  of  the  EI1R  approach  to  describe, 
empirically  and  demog r aphic ally  , fertility  histories  with  full 

variability  and  histories  with  some  known  sources  of  irregularity 

. 25 

removed . 

B2 . EHR  analysis 

Preliminary  EHR  analysis  of  20th  century  U.S.  data  had  suggested 
that  this  analytic  approach  can  describe  changing  overall  fertility  patterns 
more  or  less  well  in  several  forms. No  consideration  was  given  in  this 


10 


earlier  work,  however,  to  the  possible  demographic  significance  of  such 
a description.  The  present  analysis  of  Swedish  overall  and  marital  fertility 
time  sequences  tested  in  detail  a variety  of  combinations  of  re-expression 
of  data  and  weighting  of  residuals  in  the  iterative  removal  of  additive 
and/or  multiplicative  components  from  the  data  and  from  successive  sets 
of  residuals.2  ' ’ 28  At  this  stage  of  exploration,  descriptions  were  restricted 
to  five  parameters.  Each  combination  was  tested  for  goodness-of-fit  to 
the  data  by  two  criteria: 

. the  extent  to  which  the  final  set  of  residuals  appeared  to  be  free 
of  pattern  or  structure; 

, , . ,30 

the  proportion  of  the  absolute  variation  in  the  data  not  explained 


T. 


1 1 


' - median  F.  . 

il  »1 


where  z = residual  for  year  i 
lj 

(or  cohort  i)  and  age  cut  j 
F cumulated,  normalized 

ij 

re-expressed  fertility  rate 
for  year  i (or  cohort  i ) 
and  age  cut  j 


Results  are  presented  here  for  one  effective  re  -expre  s s ion-we ighting 
combination  (as  judged  by  this  pair  of  criteria)  combined  with  the  iterative 
selection  of  two  time-by-age  components  for  each  series. 

The  selected  re-expression  is  the  folded  square  root  of  the  cumulated, 
normalized  fertility  schedule 

k , , h 

F. . = (f.  ,r  - (1  - f. .) 

ij  i.i  i.i 

This  centers  the  distribution  on  the  mean  of  the  schedule  and  stretches 
and  straightens  both  ends  of  the  cumulated  fertility  distribution's  sigmoid 


1 1 

configuration.  The  values  for  a given  schedule  are  thus  transformed  from 
the  range  (0.  0 to  1.0)  to  the  range  (-  1 . 0 to  1.0)  for  analysis. 

Setting  c - 12  in  the  biweight  function  used  in  the  robust/resistant 
estimation  (so  that  residuals  whose  absolute  value  is  greater  than  12  times 
the  median  absolute  value  of  residuals  are  given  zero  weight)  proves  to  be 
an  appropriate  choice.  This  includes  in  the  fitted  parameters  a high  pro- 
portion of  the  systematic  variability  in  the  data,  while  still  avoiding  diffi- 
culties with  "outliers." 

The  description  developed  of  each  two-way  fertility  table  can  be 
expre  s sed : 

F=aA+SB.+  z.. 
i.i  i .1  i .1  i.t 

where  F = re-expressed  fertility  rate  for 

i.i 

year  i (or  cohort  i ) and  age  cut  j 
1 (for  standardization) 

j .i 

z residual  for  year  i (or  cohort  i I 

i.i 

and  age  cut  j 


The  order  of  fitting  for  this  expression  emphasizes  the  patterns  in  the  age 
dimension  of  the  fertility  matrix: 

the  iterative  selection  of  a central  value  for  the  distribution  of 
fertility  within  each  age  cut  across  time,  the  formation  of  an 
age  vector  from  these  six  "biweight  centers",  and  determination 
of  this  vector's  variability  in  the  time  dimension; 

. from  the  residuals,  iterative  selection  of  a second  central  value 
for  each  age  cut,  formation  of  a second  age  vector  from  these 


six  "biweight  centers",  and  determination  of  this  vector's 

variability  in  the  time  dimension; 

. iterative  improvement  of  the  two  age  vectors  (and  their  scalar 

multipliers)  from  successive  sets  of  residuals  until  convergence 
J 32 

occurs  for  A.  and  B.. 

J J 

Since  EHR  analysis  places  few  constraints  on  the  fitting  procedure 
(for  example,  not  requiring  that  the  age  vectors  be  orthogonal)  it  is  possible 
for  the  scalar  multipliers  of  the  iteratively  selected  vectors  to  be  more  or 
less  linearly  related.  This  occurs  when  the  distribution  shows  appreciable 

variation  at  the  first  age  cut  (as  with  the  X20,  MXl  3 and  MX20  time 

33  . . 

sequences  or  in  cross-population  comparisons)  in  contrast  to  variation 
largely  confined  to  the  central  age  cuts  (as  with  the  Xl  5 sequence  where 
age  15-19  fertility  is  always  low).  This  outcome  gives  emphasis  to  the 
importance  of  standard  form  re-presentations  of  fits  as  discussed  below, 
page  26. 


In  trying  to  develop  as  sharp  as  possible  a mathematical  descrip- 
tion of  the  regularities  underlying  the  data's  variability,  one  starts  with 
EHR  analysis.  Knowledge  of  the  data  directs  re-presentation  of  the  selected 
fit  in  a standard  form;  the  re-presented  fit  then  directs  further  analysis 
and  interpretation. 

C.  How  well  has  the  EHR  analysis  developed  descriptions  of  the  Swedish 


fertility  time  sequences? 


The  motivation  of  this  study  of  the  age  distribution  of  fertility  was 
exploratory- -and  successful  exploration  relies  heavily  on  examination  of 


1 3 

the  residuals,  not  only  their  size,  but  more  importantly  their  distribution. 
Later,  interpretation  of  parameters  is  also  enhanced  by  attention  to  idio- 
syncracies  in  the  residuals.  Our  discussion  therefore  starts  with  the 
residuals,  using  those  from  the  Xl  5 and  Cl  5 histories  to  illustrate  some 
of  the  considerations. 

Cl  . Size  of  residuals 

Size  of  the  residuals  is  mentioned  first,  only  because  it  will  be 
important  for  the  reader  to  have  in  mind  how  relatively  small,  in  these 
cases,  the  residuals  are  that  are  then  being  examined  for  structure.  The 
familiar  expression  of  proportion  of  squared  variation  explained: 


71(V  - mean  F .) 

ij  ij 

showed  fits  of  99.96-99.99%  to  be  not  uncommon  for  these  fertility  his- 
tories w'hen  comparing  various  combinations  of  re-expression  and  fitting 
conditions.  All  fits  are  reported  here  (Table  1 I in  terms  of  the  less 
extreme,  and  somewhat  more  useful  proportion  of  the  linear  variation 
explained,  based  on  sum  of  the  absolute  deviations 

L |z. . 

i . __ il 

IF  - med ian  F.  . 
ij  ij 

One  reason  for  the  greater  usefulness  is  reduced  attention  to  a few  large 

residuals,  likely  to  come  from  specific  events  or  specific  errors  in  data 

collection.  In  slightly  different  contexts,  it  may  be  desirable  to  use  even 

34, 35 


more  resistant  measures  of  quality  of  fit. 


14 


Reported  and  fitted  distributions,  for  representative  years  with  very 

36 

different  age  distributions  of  fertility  and  for  some  years  of  least  good 

37 

fit,  are  given  in  Table  2 on  both  re-expressed  and  raw  fraction  scales. 
Except  at  two  of  the  1110  cross-sectional  age  cuts  and  63  of  the  930  cohort 
age  cuts,  the  difference  between  reported  and  fitted  values  is  in  no  more 
than  the  third  decimal  place  on  the  raw  fraction  scale.  Even  working  with 

the  full  marital  fertility  distribution,  including  age  15-19  births,  results 

, ....  38,39 

in  very  close  fits. 


C2  . Structure  of  residuals 

In  practice,  examination  of  the  residuals  for  structure  should  come 
first  in  deciding  between  various  combinations  of  re-expression  and  fitting 
conditions.  For  data  in  time  sequence,  large  residuals  with  truly  irregular 
distribution  suggest 

• the  presence  of  errors  in  the  data,  or 

• the  impact  of  singular  events. 

At  another  extreme,  large  and  highly  patterned  residuals,  perhaps  for  the 
end  of  a time  sequence  after  an  extended  period  of  very  close  fit,  indicate 
a major  transition  in  the  underlying  patterns,  and  suggest  the  need  for 

• alteration  in  the  specified  combination  of  re-expression  and  fitting, 
to  accommodate  the  new  pattern  also,  and/or 

• division  of  the  series  at  that  period  to  analyze  the  portions 


separately. 


Table  1 . Features  of  Residuals  from  EHR  Fitting  of  F.  .=  Of  .A  .+  J3.B  . 

ij  i J i J 

to  the  Age  Distribution  of  Fe;  ility.  Expressed  on  the 
Folded  Square  Root  Scale,  in  Selected  Time  Sequences: 
1775-1959 


% Variation  , Residuals 


Sequence 

Explained 
(linear  scale) 

Sbi 

Median  Absolute 

Upper  9 0 °r 

XI  5 

99.  0 

4.14x10' 

5 

. 0046 

. 0101 

X20 

98.  9 

Cl  5 

98.  5 

8.07x1  0" 

5 

. 0057 

. 0160 

C20 

98.  3 

MX1  5 

98.  9 

1 . 76  x 1 0" 

5 

. 0026 

. 0072 

MX20 

99.  1 

Table  2.  Reported  and  EHR  Fitted  Fertility  Distributions  on 
Re-expressed  and  Raw  Fraction  Scales:  Selected 
Sequences  and  Selected  Years,  1775-1959 


Age  Cut 


Source  of 


distribution 

1 9/20 

24/25 

29/30 

34/35 

39/40 

44/45 

XI 5 1 890 

Re-expressed 

Reported 

Fitted 

Res  idual 

0. 8774 

0. 8781 

0. 0007 

-0. 5528 
-0. 5579 

0.  00  51 

-0. 1 774 
-0. 1 801 

0. 0027 

0. 1 869 
0.  1933 
-0. 0064 

0. 5424 

0. 5467 

-0. 0043 

0.  8536 
0. 8467 
0. 0069 

Raw  fraction 
Reported 

F itted 

Residual 

0. 01 54 

0.01  53 

0. 0001 

0. 1402 

0. 1 375 

0. 0027 

0. 3755 

0. 3737 

0. 0018 

0.6310 

0 . 6 3 54 

-0. 0044 

0. 8542 
0. 8565 

-0. 0023 

0.  981  3 

0. 9795 
0.  0018 

XI  5 1950 

Re- expressed 
Reported 

Fitted 

Residual 

0 . 666  3 

0.  6720 

0. 0057 

-0. 2035 
-0. 2029 
-0. 0006 

0. 1 929 

0. 2023 

-0. 0094 

0.5131 
0.  51 89 
-0.  0058 

0. 7709 
0. 7682 

0.  002  7 

0.  9412 
0.  9376 
0. 0036 

Raw  fraction 
Reported 

Fitted 

Residual 

0. 0844 

0. 0819 

0.0025 

0.  3576 

0.  3580 
-0. 0004 

0. 6352 

0. 641  6 
-0. 0064 

0. 8381 

0. 8414 

-0. 0033 

0. 9570 
0. 9561 
0. 0009 

0. 9967 
0.  9963 

0. 0004 

XI  5 1792  (a  "worst 
R • -expressed 
Reported 

Fitted  ^ 

fit”) 

0. 8410 

0. 851 6 

-0.  5 34  5 
-0. 5378 

-0.  1 789 
-0. 1 697 

0. 1 829 
0. 1 928 

0. 561 0 

0.  5 352 

0. 8267 
0. 8253 

Residual 

0.  01 06 

0. 0033 

-0. 0092 

-0. 0099 

0 0258 

0. 0014 

Raw  fraction 
Repo  rted 

F itted 

0. 021 9 
0.0193 

0.  1 501 

0.  1483 

0. 3745 

0.  3809 

0. 6283 

0.  6351 

0. 8641 
0. 8503 

0. 9743 
0. 9739 

Residual 

0. 0026 

0. 001 8 

-0. 0064 

-0. 0068 

0.0138 

0. 0004 

Table  2 - cont'd. 


17 

T able  2 - cont 'd  . 

Cl  5 1 890 


Re -expressed 


Reported 

-0. 8685 

-0. 5007 

-0. 0854 

0.2947 

0.6352 

0. 8947 

F itted 

-0. 8655 

-0. 4925 

-0.0875 

0. 2890 

0.  6308 

0.  9047 

Residual 

-0. 0030 

-0. 0082 

0. 0021 

0. 0057 

0. 0044 

-0. 0100 

Raw  f r act  ion 

Reported 

0.0153 

0. 1689 

0. 4397 

0. 7038 

0. 901 3 

0. 9899 

F itted 

0. 0160 

0.1735 

0.4383 

0. 7001 

0. 8992 

0. 991 7 

Residual 

-0. 0007 

-0. 0046 

0. 0014 

0. 0037 

0.0021 

-0. 001 8 

;l  5 1 920 

Re- expressed 

Report  ed 

-0. 7256 

-0. 28^2 

0. 1 075 

0. 4238 

0.6934 

0.  9342 

Fitted 

-0. 7264 

-0. 2926 

0. 1 067 

0.4  327 

0. 7033 

0. 8837 

Res  idual 

0. 0008 

0. 0074 

0. 0008 

-0. 0089 

-0. 0099 

0. 0505 

Raw  fraction 

Reported 

0.  0 59fi 

0. 3025 

0 . 5758 

0. 7859 

0. 9273 

0. 9959 

Fitted 

0. 0593 

0. 2976 

0. 5752 

0.  791  3 

0.9314 

0. 9878 

Residual 

0. 0003 

0. 0049 

0. 0006 

-0. 0054 

-0. 0041 

0. 0081 

; 1 5 1929  (a  "worst 

: fit") 

Re -expressed 

Repo  rted 

-0. 7625 

-0.  3925 

0. 002  3 

0. 4608 

0. 7777 

0. 9480 

Fitted 

-0. 7681 

-0. 2969 

0.1314 

0 . 4 772 

0.  7616 

0. 9472 

Re  s idual 

0. 0056 

-0. 0956 

-0.1291 

-7k~oT64 

0.0161 

0. 0008 

Raw  fraction 

R e ported 

0.  0459 

0.  2 314 

0. 5016 

0. 8081 

0. 959  3 

0. 0974 

Fitted 

0. 0440 

0.  294  7 

0. 5925 

0.  81  76 

0. 9538 

0. 9974 

Residual 

0. 0019 

-0. 061 3 

-0. 0909 

-0. 0095 

0. 0055 

0. 0000 

Table  2 - cont'd. 


r 


'a 


18 

T able  2 - cont'd . 

MX  1 5 1905 


Re- expressed 


Reported 

Fitted 

Residual 

-0. 2868 
-0. 2888 

0. 0020 

0. 0298 

0.  0325 

-0. 0027 

0.2789 
0. 2789 
0. 0000 

0.  5023 

0.4983 
0. 0040 

0.  71 30 
0. 7129 
0.  0001 

0. 9081 
0. 9094 
-0. 001 3 

Raw  fraction 
Reported 
Fitted 

Residual 

0. 3014 

0.  3001 

0. 0013 

0. 521  1 
0.  5230 

-0.0019 

0.6933 
0.  6934 
-0. 0001 

0. 8320 

0. 8298 

0. 0022 

0. 9354 
0. 9354 
0. 0000 

0. 9923 
0.  9925 
-0. 0002 

MXl  5 19  30 

Re-expressed 
Repo  rted 

F itted 
Residual 

-0. 1 107 
-0. 1089 
-0. 001 8 

0.  21 62 

0. 2091 

0. 0071 

0. 4395 

0. 4386 
0. 0009 

0. 621 5 

0.  6280 
-0. 0065 

0. 7903 
0. 7941 
-0. 0038 

0. 9349 
0. 9288 
0. 0061 

Raw  fraction 
Reported 

F itted 
Residual 

0. 4219 
0.4232 
-0. 001 3 

0.6511 
0. 6462 

0. 0049 

0. 7954 

0. 7948 
0. 0006 

0.  8947 
0. 8979 
-0. 0032 

0. 9634 
0. 9646 
-0. 0012 

0. 9960 
0. 9953 
0. 0007 

MXl  5 1955  (a  "worst  fit") 


Re-expressed 


Repo  rted 

F itted 
Residual 

-0.0345 

-0. 0254 

-0.  0091 

0.2999 
0. 3029 

-0. 0030 

0. 5397 

0. 5328 
0. 0069 

0. 721  1 

0.7147 
0. 0064 

0. 8652 
0. 8635 
0. 001 7 

0. 9620 
0. 9731 
-0. 01 1 1 

Raw  fraction 
Repo  rted 

F itted 
Residual 

0. 4756 

0.4820 

-0.  0064 

0. 7072 
0.7092 

-0. 0020 

0. 8528 
0. 8490 
0. 0038 

0. 9386 
0.9361 

0. 0025 

0. 9839 

0. 9835 

0. 0004 

0.  9986 
0. 9993 
-0. 0007 

19 


» 


Frequently  encountered,  also,  are  residuals  of  intermediate  size 
and  irregular  enough  distribution  that  existing  pattern  isn't  readily  seen. 
Then,  detailed  examination  of  the  residuals  is  needed 
. to  bring  out  the  hidden  regularities,  and 

to  indicate  what  change  in  data  re-expression  or  fitting  will 
accommodate  the  detected  additional  pattern. 

Two  useful  procedures  are  scatter  plots,  illustrated  below  (see  page  20),  and 

, • , 90 

diagnostic  plots. 

In  these  examples  of  irregular  and  patterned  residuals,  the  sum  of 
the  absolute  deviations  from  fit  may  actually  be  the  same;  it  is  the  location 
of  deviations  that  determines  the  analyst's  response  and  leads  to  better 
understanding  of  the  data.  Figure  1 shows  a plot  of  residuals  which 
suggests  alteration  in  specified  re-expression  and  fitting,  and  also  leads 
to  consideration  of  the  period  around  1910-1920  as  a major  transition  in 
the  fertility  history.  Identified  structure  in  even  very  small  residuals, 
as  in  the  present  report,  can  suggest  directions  to  move  in  seeking  still 
sharper  descriptions  of  the  data. 

C^.  Ways  of  looking  at  residuals 

Of  a number  of  informative  ways  of  looking  at  residuals,  three  are 
illustrated  here  for  the  XI  5 and  Cl  5 analyses: 

. schematic  plots, 

. scatter  plots  for  pairs  of  age  cuts,  and 


time  sequence  plots  for  each  age  cut  . 


20 


They  demonstrate,  in  different  ways,  the  extent  to  which  the  developed 
descriptions  have  captured  the  underlying  structure  of  the  fertility  histories. 

(a)  Schematic  plots  order,  by  size,  the  residuals  for  each  age  cut,  and 
summarize  their  distribution  within  an  age  cut.  These  plots  give  a con- 
venient picture  of  the  degree  of  symmetry  in  the  spreading  of  residuals 
around  the  median  residual  for  each  age  cut,  and  the  degree  of  agreement 

between  age  cuts  in  the  amount  of  spreading  of  the  re sidual s - -both  indicators 

4 1 

of  the  extent  to  which  pattern  remains  in  the  residuals.  For  Cl  3 (Fig.  2), 

90 of  the  residuals  (all  of  those  within  the  range  from  one  interquartile 
distance  of  the  upper  quartile  to  one  interquartile  distance  of  the  lower 
quartile)  are  quite  similarly  and  evenly  distributed  for  all  age  cuts;  however, 
the  outliers,  indicated  by  the  open  and  shaded  circles,  are  predominantly 
positive  for  the  first  two  and  last  two  age  cuts,  predominantly  negative  for 
the  two  central  age  cuts.  In  contrast,  the  X15  residuals  (Fig.  31  are  more 
compact.  While  symmetr ically  distributed  around  the  median  for  the  first 
four  age  cuts,  they  show  dissimilarity  between  age  cuts  in  the  amount  of 
spread,  however,  particularly  in  the  interquartile  range.  At  even  this 
early  stage  of  examination,  one  can  conclude,  then,  that 

. some  structure  remains  in  both  cohort  and  cross-sectional 
re  siduals ; 

what  structure  is  most  visible  lies  in  the  outliers  for  the  cohorts, 
but  appears  in  the  main  body  of  the  residuals  for  the  cross- 
sectional  series. 

(b)  Scatter  plots  summarize  the  size  and  sign  relations  of  residuals 


- — i- 


21 


for  pairs  of  age  cuts.  They  use  the  residuals,  ordered  in  time  sequence 
for  any  age  cut,  and  plot  them  against  the  residuals  ordered  in  time  sequence 
for  each  other  age  cut.  This  reveals  any  tendency  for  the  residuals  for  a 
pair  of  age  cuts  not  to  vary  randomly  with  each  other,  but  to  vary  together 
by  year  in  some  systematic  way--for  example,  to  have  the  same  sign  and 
magnitude  for  any  selected  year.  To  illustrate,  the  XI  3 residuals  for  age 
cut  24/23 

. show  no  systematic  variation  over  time  with  those  for  age  cut 
29/30  (Fig.  4A), 

tend  to  be  of  opposite  sign  by  year  from  those  for  age  cut 
39/40  (Fig.  4Bt, 

. tend  to  be  of  the  same  sign  by  year  as  those  for  age  cut 
44/4  3 (Fig.  4C)!2 

(ct  A time  sequence  plot  of  residuals  by  age  cut  can  provide  more 
detail  about  remaining  structure.  We  present  such  plots  here  with  the 
residuals  expressed  on  the  raw  fraction  scale  so  that  the  relative  impor- 
tance of  any  residual  can  be  judged  directly.  To  emphasize  the  nature  of 
the  small  deviations,  the  residuals  have  also  been  coded 

2 . - i ± 

< (M-  1/4  I)  1 (M-  1/4  1)  to  (M+l/4  I)'  > (M  + 1/4  II 

where  M - median  residual  across  all  years  and  age  cuts 

I = range  of  values  between  the  residual  at  the  lower 
23%  point  and  the  residual  at  the  upper  23%  point. 

W he  n t he  111  0 res  iduals  for  XI  3 are  ordered  by  value  a lone,  I is  then 


22 


the  range  between  the  value  of  residual  number  278  and  the  value  of  residual 
number  832.  Coding  values  are  shown  in  Table  3 for  all  three  sets  of 
residuals  to  be  examined  here  in  time  sequence  plots. 

For  the  Xl  5 residuals,  such  a plot  (Fig.  5)  confirms  the  schematic 
plot  impression  of  few  singular  departures  from  fit  at  any  age  cut.  The 
years  1 783  and  1 792  stand  out  as  deviants.  The  plot  also  locates  in  the 
time  dimension  some  departures  from  random  distribution  which  were 
detected  in  the  scatter  plots  of  the  pairs  of  X15  residual  vectors.  Age 
cuts  34/35  and  39/40  tend  to  have  residuals  above  the  central  interval 
before  1 855,  then  below  the  central  interval  until  about  1937.  Age 
cuts  24/25  and  44/45  show  the  reverse,  i.e.  , a tendency  for 
residuals  below  the  central  interval  before  1 855,  above  the  central  interval 
until  about  1937.  This  means,  for  example,  that  before  1 855  a slightly 
higher  proportion  of  births  is  reported  to  have  occurred  between  the  ages 
of  30  and  40  in  most  years  than  the  fit  would  have  predicted.  Residuals 
by  age  cut  for  X20  have  the  same  systematic  variations  as  those  for  XI  5, 
indicating  that  departure  from  fit  is  not  greatly  influenced  by  age  15-19 
ferti  lity . 

The  lesser  accuracv  of  the  data  in  the  early  years  of  the  series  may 

contribute  to  the  observed  residual  pattern.  The  distribution  of  residuals 

suggests  also  that  a more  complex  model  may  remove  the  remaining 

43 

pattern.  In  the  present  report  we  want  to  show  how  much  can  be  learned 
from  a relatively  simple  EHR  description  of  the  fertility  time  history.  The 


_ 


23 


Table  3.  Basis  of  Coding  of  Raw  Fraction  Scale  Residuals  in  Time 
Sequence  Plots  (Figs.  5,  6 and  7) 


Sequence 

Median 

residual 

Lower  25% 
point 

U ppe  r 2 5 % 
point 

I 

M-  1/4  I 

M + l/4  I 

XI  5 

000058 

00214 

. 00191 

. 00404 

-.00107 

. 00095 

Cl  5 

000057 

0021 5 

. 00256 

. 00471 

-. 00124 

.00M2 

Ml  5 

. 000047 

00099 

. 00121 

. 00220 

-. 00050 

00060 

24 


decision  on  whether  to  go  to  a more  complex  model  seeks  appropriate 
balance  between  parsimony  and  completeness  of  description.  A model 
sufficiently  complex  to  give  an  excellent  mathematical  description  of 
almost  all  variation  in  a data  set  can  usually  be  found,  but  is  at  least 

somewhat  more  likely  to  have  lost  demographic  significance  of  its  param- 

44 

eters . 

Time  sequence  plots  of  residuals  by  age  cut  for  Cl  5 (shown  in  Pig.  6 
in  coded  form  on  the  raw  fraction  scale)  present  a very  different  picture 
from  that  of  Xl  5 residuals.  When  the  vector  of  residuals  for  any  age  cut 
(e.g.  29/301  is  compared  to  the  vector  of  residuals  for  the  next  age  cut 
(34/35  in  this  example)  shifted  forward  by  five  years,  the  fluctuations 
appear  highly  correlated.  This  visual  impression  is  reinforced  by  calcu- 
lated correlation  coefficients  (.63-.  83)  for  the  pairs  ot  lagged  residual 
vectors  on  the  folded  square  root  scale  used  in  the  fitting.  The  fluctuations 
are  less  pronounced,  however,  for  about  a 40-year  span  (e.g.  from  1870  to 

1910  for  cohorts  at  age  cut  24/25). 

Residuals  by  age  cut  for  C20  have  the  same  pattern  as  that  observed 
for  Cl  5.  Residuals  from  EHR  analysis  of  the  cohort  marital  fertility 
history  for  the  last  38  cohorts  in  the  overall  fertility  history,  using  the 
same  combination  of  re-expression  and  fitting,  also  exhibit  the  lagged 
pattern . 

At  least  two  questions  about  these  systematic  variations  need  to  be 
explored : 

. Has  the  use  of  cross-sectional  data  in  five-year  age  groups  to 


25 


4 5 

construct  the  cohort  histories  contributed  to  the  variation? 

. Does  the  lagged  pattern  represent  a period  effect  on  cohort 

distribution--either  the  influence  of  real  events  which  have 

affected  the  childbear ing  of  all  age  groups  to  some  extent  in 

particular  years,  or  the  dissemination  of  cross-sectional  errors 

4 6 

to  a number  of  cohorts? 

Further  exploration  indicates  that  other  re-expressions  of  the  data  coupled 


with  a more  complex  model  can  bring  at  least  part  of  the  lagged  pattern 
from  the  residuals  into  the  fitted  components  when  this  is  desirable.^’ 

The  very  small  residuals  for  MX15  (shown  in  F'ig.  7 in  coded  form 
on  the  raw  fraction  scale)  and  for  MX20  include  no  singular  departures 
from  tit,  but  some  tendency  at  all  age  cuts  for  stretches  either  above  or 
below  the  central  interval. 

C4.  Conclusions  about  residuals 

1 . Some  structure  remains  in  the  small  residuals  of  each  of  the 
overall  fertility  time  histories,  Xl  5,  X20,  Cl^  and  C20.  Fixtensions  of 
this  EHR  analysis  would  naturallv  ask: 

. Are  more  complex  descriptions  of  fertility  distributions 
appreciably  better  ' 

. Do  other  re-expressions  ot  the  data  result  in  still  better 
, fits'.’ 

2.  The  description  of  each  of  the  histories  is  sufficiently  good  to 
. proceed  to 

. re-presentation  of  the  fits  in  a standard  form,  and 
i 


26 

examination  and  interpretation  of  the  resulting  parameter 
vectors  which  describe  the  major  regularities  inherent 
in  each  of  the  histories. 

D.  Re-presentation  of  fits  in  a standard  form 

As  suggested  above  (see  page  12)  the  freedom  from  most  constraints 
in  the  EHR  fitting  procedure  means  that  the  final  fitted  representation  of 
the  data  is  not  unique  but  is  one  of  many  exactly  equivalent  linear  combi- 
nations of  the  parameters.  To  direct  further  analysis  and  to  permit  com- 
parison and  interpretation  of  the  fitted  parameters  of  different  fertility 

4 8 

sequences,  it  is  important  to  re-present  each  rank-two  fit 

a A + fl  B 

i .1  i j 

as  an  identically  equivalent  standard  form  constrained  rank-two  fit 
K,  a A.  + K fl  B. 

1 1 j 2 i i 

where  a = aa  + bfl  , 
i i i 

A . = cA  + dB  , 

J .1  .1 

a.b.c.d.Kj,  and  K , are  constants. 

„ ::2  ,2 

LA.  - _i  B . r 1 for  standardization 

J .1 

and  the  total  number  of  constraints  equals  4 
to  ensure  uniqueness. 

For  example,  for  the  XI  5 history,  the  fitted  rate  (as  expressed  on  the 
folded  square  root  scale)  at  age  cut  34/^S  in  1910  can  be  presented  in  the 
identically  equivalent  forms: 


27 


F1  36,4  “ 1 36  A4  + ^1  36  B4 


= (1 .4548)(.  1620)  4 (.  0800 )(.  4772) 

= . 2781 

P 1 36, 4 “ Kltt  1 36  A4  + K2^1  36  R4 

= (1 . 4 759H.  999  5 )(.  11  70)  4 (.  9992 )(.  2222 )(.  4754  ) 

= .2781 


Constraints  for  standard  forms  of  rank-two  fits  are  of  three  types: 

!« 

. "fixing"  a vector  (e.g.  to  be  a constant  or  to  be  linear  I or 
requiring  it  to  be  as  nearly  as  possible  of  a given  form; 

. making  two  vectors  orthogonal; 

. maintaining  absence  of  a mixed  row -by-column  term, 

a . 13  . or  BA. 

1 J 1 I 

The  four  constraints  chosen  here  for  a standard  form  re-presentation  of 
the  fitted  fertility  distributions  were: 

(a)  to  make  a aa  + b/3  as  nearly  constant  as  possible,  bv  LS 

Iivi 

regression  (with  0 intercept)  of  a vector  of  repeating  constants 


on  a.  and  /3..  The  two  regression  coefficients  then  equal  a 
and  b. 

to  make  A.  - cA  . 4 d 13 . as  nearly  linear  as  possible,  by  a 

.1  J .1 

canonical  regression  of  two  linearly  independent  straight  lines 
49 

on  A.  and  13..  The  two  elements  of  the  eigenvector  associated 
J .1 

with  the  largest  canonical  correlation  are  then  equal  to  c and  d. 

(A.  and  13.  for  all  of  the  time  sequences  will  be  found  in 
J .1 


Appendix  13,  Table  131  . ) 


28 


(cl  and  (d)  to  maintain  diagonality,  i.  e the  absence  of  both  a.B. 
and  8 A.  in  the  expression. 

i .1 

Calculation  (from  A.  , a.  B.,/3.,  a,b,c,  and  d)  of  the  fi  and  B to  meet 

J i .1  1 1 •' 

these  requirements  is  shown  in  Appendix  B.  Detailed  consideration  of 

alternative  standard  form  re- pre sentations  in  the  two-way  case  will  be 

, , 50 

found  in  unpublished  work  of  Tukey. 

For  all  of  the  fertility  time  sequences  the  regressions  to  fix  a. 
are  quite  clear,  with  a fit  of  at  least  .983  based  on  sum  of  the  absolute 
deviations  from  a vector  of  constants  (Appendix  B,  Table  B2).  The 
greater  degree  of  departure  from  constancy  for  the  Cl  5,  C20.  and  MX15 
sequences  is  attributable  to  a small  number  of  members  of  each  sequence 
(see  Figs.  8 and  15  below,  pages  86  and  93). 

The  canonical  correlation  analysis  to  fix  A.  close  to  linearity 
produces  eigenvalues  of  at  least  .999  for  the  first  canonical  variate  for 
each  time  sequence  (Appendix  B,  1 able  B2l.  This  means  that  each  of 
the  A vectors,  composed  of  A.  and  B.  in  proportion  to  the  corre- 

j -1  -1 

sponding  elements  of  the  first  eigenvector,  departs  from  linearity  by  no 
more  than  0.  1 %. 

The  "strength"  of  a constraint  might  be  considered  its  relative 
immunity  to  sampling /measurement  fluctuations  as  transmitted  by  the 
fitting  process.  It  is  possible  that  a different  set  of  constraints  with  as 
great  or  greater  overall  strength  could  have  been  chosen  (e.g.  by  fixing 
one  vector  exactly,  instead  of  as  close  as  possible  to  linearity:  or  by 
making  one  vector  orthogonal  to  a fixed  one,  instead  of  fixing  two  vectors: 


k. 


h 


29 


b 


L 

> 


or  by  formal  orthogonality,  with  A l B . and  0£ . 1 /3.  )•  There  is  some 

reason  to  expect,  however,  that  demographic  interpretation  of  parameters 
would  be  more  difficult  in  many  of  these  cases. 

For  comparisons  between  time  sequences,  we  incorporate  the  con- 
stants K.  and  K_  in  the  time  parameters: 

1 2 

a . = K a . . 0.  = K,|3. 

i 1 i i 2 l 

so  that  each  of  the  elements  of  the  ith  fitted  fertility  distribution  is 
expressed  as 

F a A.+/3.B. 
lj  i l i j 

The  age  vectors  A . and  B.  for  all  of  the  time  sequences  will  be  found 
in  Appendix  B,  Table  Bh 

E.  Relations  between  the  age  distribution  of  births  and  the  components 
of  a standard  form  EUR  fit 

We  are  now  ready  to  look  closely  at  the  two  components  of  a fit. 

A and  B B . viewing  each  as  a vector  composed  ot  one  element  tor 

i j i j 

each  age  cut.  We  ask  what  changing  each  component  vector,  by  varying 

a or  j3.  . does  to  the  age  distribution  of  births, 
i i 

The  fitted  XI  9 history,  which  includes  all  births  and  all  women  aged 
15-49,  is  selected  for  this  examination.  The  extreme  values  which  the 
elements  of  each  component  vector  assume  for  this  sequence  are  shown 
in  T able  4 . 


1 

!• 

t 

J 


50 


Table  4„  Ranee  of  Contribution  of  Components  a. A.  and  8.  B.  to  the 

1 J l j 

Fitted  Description  of  the  Xl  5 Sequence,  Folded  Square 
Root  Scale 


Time  Parameter  Component 


Age  Cut 


a . 

i 

a . A . 

i j 

19/20 

24/25 

29/30 

34/3  5 

39/40 

44/4  5 

Low 

1 . 420 

- . 834  3 

- . 5559 

-. 1953 

. 1661 

.5112 

. 8066 

Median 

1 . 4 7 t 

8886 

- . 5782 

-.2031 

. 1 728 

.5317 

. 8389 

High 

1 . =5 1 8 

91 32 

-.594  3 

-.2087 

. 1 776 

. 5465 

. 8622 

< 

LL 

Low 

- . 0386 

-.0111 

-.0196 

-. 021  3 

- . 01 84 

-.0127 

-. 0036 

High 

. 8427 

.2419 

. 4276 

.4656 

. 4006 

.277  3 

.1212 

************** 


Table  5.  Age-Specific  Fertility  Distributions  (raw  scalel  Based  on  the  Age 

Parameters  A and  B for  X15  and  Increments  in  the  Time 
.i  ,i 

Parameters  Oc.  and  B. 

l i 


a. 

i 

1 .47 

1.47 

1.  47 

1.47 

1.  42 

1 . 52 

Change  by  0.1 

i 

Change  by  0.  1 

0 

0.  1 

0.0  5 

0.05 

0.  05 

0.  05 

Age 

Group 

Change  in  f(a  1 

f (a  1 

f(a  1 

f(a  1 

f(a ) 

(la > 

Change  in  f ( a ) 

15-19 

. 0063 

.0120 

.0183 

. 01  50 

. 0222 

. 0001 

-.0131 

20-24 

. 0209 

.116  3 

. 1 372 

. 1 267 

.1301 

. 1 222 

- . 0079 

25-29 

.0110 

.2303 

.2413 

. 2359 

. 2 301 

.2415 

.0114 

30-  34 

- . 0055 

. 2622 

. 2567 

. 2596 

. 2508 

.2683 

.0175 

35-  39 

-.0150 

. 226  3 

.2113 

. 21  88 

.2131 

. 2244 

.0113 

40-44 

-.0140 

. 1 295 

.1155 

. 1 224 

. 1244 

.1196 

-. 0048 

45-49 

- . 0037 

. 0235 

.0198 

.0216 

. 0294 

.0149 

-.0145 

it 


31 


» 


The  a A vector  represents  (on  the  folded  square  root  scale  used  in  the 
i .i 

fitting)  an  underlying  pattern  of  cumulation  of  births  by  age.  It  is  centered 
on  zero  at  the  median  age  (here,  32.  7 years)  that  the  maternity  schedule 
would  have  if  no  3 13.  were  added.  The  more  negative  the  value  of 
a A at  an  age  cut  below  the  median  age.  the  smaller  the  proportion  of 

i j 

total  births  included  bv  that  age  cut;  the  more  positive  the  value  of  a.  A 

i i 

at  an  age  cut  above  the  median  age,  the  higher  the*  proportion  of  total  births 
included  by  that  age  cut. 

The  addition  of  the  3 B vector  makes  a non-linear  alteration  in  the 

l j 

cumulative  maternity  schedule  expressed  in  a.  A..  The  result  is  a change 

i .1 

in  shape,  and  in  median  age,  of  the  schedule.  A positive  value  of  3 

' 

lowers  the  median  age,  a negative  value  raises  it. 

To  see  readilv  the  effects  of  change  in  a and/or  ft  we 

i i 

...  ... 

construct  a series  of  fertility  distributions  based  on  A.  and  B.  , 

* 

and  uniform  increments  in  a and  3.  ; 

i ' 

de-transform  these  distributions  from  their  tolded  square  root 
scale  to  the  raw  fraction  scale; 

. express  the  distributions  in  non-cumulated  form  (Table  St. 

A positive  change  in  3.  decreases  the  proportions  of  births  in  the 
four  highest  age  groups,  increases  the  proportions  in  the  three  lowest. 

On  the  other  hand,  a positive  change  in  a.  increases  the  proportions  in 
the  central  part  of  the  childbearing  years  at  the  expense  of  the  two  lowest 
and  two  highest  age  groups.  Since  a.  is  a near-constant,  any  shift  of 


Ua 


32 


? 

!< 


» 


i 


births  toward  or  away  from  the  central  ages  is  limited  to  a narrow  range. 

In  fact,  the  full  range  for  the  XI  3 sequence  is  covered  in  Table  S. 

To  understand  still  better  what  sort  of  description  we  are  developing, 

let  us  shift  our  emphasis  from  the  births,  the  outcome,  to  the  distribution 

of  exposure  to  this  outcome.  If  we  consider  each  woman  as  a potential 

activity  state,  and  a birth  as  evidence  of  a functioning  activity  state,  the 

systematic  variations  expressed  in  a.  A.  and  B.  B.  seem  natural  ones 

1 1 i 1 

for  fertility  distributions.  Whatever  proportion  of  all  potential  activity 
states  is  functioning,  this  activity  could  be  distributed  evenly  across  age 
groups  if  women  of  all  ages  had  equal  opportunity  of  becoming  active.  The 
proportion  of  women  both  biologically  susceptible  and  exposed  to  the  pos- 
sibility of  pregnancy  increases  over  the  earliest  childbearing  ages. 


however,  and  declines  over  the  highest  ages.  A first  natural  variation 
from  uniform  distribution  is,  therefore,  some  degree  of  concentration  of 

functioning  activity  states  around  the  median  age.  This  is  accommodated 

. ' 52 

in  a . A . . 

1 J 


A second  natural  variation  in  the  age  distribution  of  active  states  is 
asymmetry  or  skewness.  High  incidence  of  delay  in  the  onset  of  functioning 
of  potential  activity  states  and/or  extension  of  functioning  to  higher  ages 
would  contribute  negatively  to  skewness;  high  incidence  of  early  onset  of 
functioning  and/or  curtailment  of  functioning  at  higher  ages  would  contribute 
positively  to  skewness.  The  net  effect  of  these  opposed  influences  on  the 
asymmetry  of  the  distribution  is  accommodated  in  B.  . 

Diversity  between  women  in  the  age  patterns  of  activity  may  contribute 


I 

L;  - ' 

J * - - — 


33 


to  a third  natural  type  of  variation  in  overall  fertility  dist  r ibutions- -a  more 
even  spreading  of  functioning  activity  states  across  a portion  of  the  repro- 
ductive span.  A simple  example  might  be  the  aggregate  distribution  for 

two  sub-populations,  one  of  which  attempts  to  limit  its  childbearing  to 

53 

early  ages,  while  the  other  delays  all  childbearing  to  higher  ages.  This 

third  type  of  variation  is  explored  in  later  work  which  fits  a third  com- 

, , . 54 

ponent  to  the  lertility  time  history. 

We  proceed  now  to  examine  the  two  components  a , A . and  B. 

l 1 l j 

developed  in  the  present  work  for  each  of  the  Swedish  time  sequences 
analyzed.  The  next  two  sections  concentrate  on  the  68-year  sequences  for 
marital  and  overall  fertility.  We  ask  what  the  components  indicate  about 
the  distribution  of  functioning  activity  states  and  about  changes  in  their 
distribution  in  this  population. 

F . Some  demographic  implications  of  the  fertility  components  derived 
by  EHR  analysis  (examination  of  the  1892-19^9  datal 


We  approach  the  question  of  demographic  significance  in  several 


stages: 


. examine  the  distinctive  features  of  the  EHR  components  of  MX15, 

M X 2 0 , XXI  5 and  XX20; 

compare  the  EHR  model  parameters  for  marital  fertility  with 
the  parameters  of  the  Coale  marital  fertility  model,  giving 
particular  attention  to  the  "natural  fertility"  standard  and  to  the 
measure  of  ’ degree  of  control"  ; 

. dissect  marital  fertility  further,  to  examine  separately  the  changes 


34 


in  distribution  of  functioning  activity  states  not  associated  with 
change  in  level  of  fertility; 

dissect  from  the  overall  distribution  of  functioning  activity  states 
that  portion  of  change  not  associated  with  change  in  the  distribu- 
tion of  marital  fertility; 

. derive  level-compensated  distributions  of  functioning  activity 
states  in  overall  and  marital  perspectives; 

approximate  age-specific  proportions  of  married  plus  cohabiting 
active  women  from  these  level-compensated  distributions. 

Fl.  EHR  components  of  marital  and  overall  fertility  histories 

For  each  of  the  68-year  marital  and  overall  fertility  sequences,  the 

first  component,  0! . A . , reveals  the  median  age 

1 J 

MX1  5 30.3 

MX20  31.0 

XX1  5 32.6 

XX20  33.5 

that  the  cumulated  maternity  schedule  would  have  if  none  of  the  component 
B B.  were  added. 

i j 

Sample  fertility  distributions  constructed  from  the  components  a . A . 

(with  a equal  to  the  mean  of  its  near  constant  value  for  each  sequence) 
i 

and  8 B (with  increasing  values  of  8-  ' can  be  examined  in  Table  6 for 

i j 1 

two  of  the  sequences,  MX1  5 and  XXI  5. 55  The  distributions  are  expressed 
in  three  forms:  the  cumulated  form  on  the  folded  square  root  scale  used 
in  the  fitting;  and  both  cumulated  and  non-cumulated  forms  on  the  raw  scale. 
The  extent  to  which  the  distribution  of  functioning  activity  states  for 


1* 


Mm 


36 


each  sequence  shows  some  underlying  concentration  in  the  central  portion 

of  the  reproductive  span  is  most  easily  examined  in  the  raw  scale  non- 

cumulated  distributions  implied  by  a.  A.  alone  with  no  fi.  B.  added.  With 

i J ' J 

addition  of  8 B , the  decline  in  median  age  of  functioning  activity  states 

i j 

is  clear  in  the  cumulative  distributions  on  either  scale.  For  XXI  5,  incre- 
ments in  fi.  over  the  observed  0.04  to  0.82  range  results  in  more  than  a 
five-year  decline  in  this  median  age.  The  degree  of  asymmetry  contrib- 
uted to  the  distribution  by  fi.  B.  at  each  increment  of  8.  reflects  the  net 

i .1  i 

imbalance  in  the  forces  which  discourage  or  intensify  the  functioning  of 
potential  activity  states  at  lower  and  higher  ages.  Change  in  asymmetry 
may  or  may  not  be  accompanied  by  change  in  the  level  of  fertility;  and 
conversely,  change  in  level  may  or  may  not  be  accompanied  by  change  in 
asymmetry . 

The  limited  year  to  year  variation  in  the  time  parameter  a.  resulting 
from  r e - p r e se ntat io n of  fits  for  the  MX1  5,  MX20,  XXI  5 and  XX20  his- 
tories can  be  examined  in  Figs.  8 and  9.  While  the  a s differ  in  level 

i 

by  sequence,  the  small  departures  of  O',  from  constancy  are  similar  for 

all  sequences.  The  most  prominent  variation  is  the  peak  at  1942-45.  The 

MXl  5 and  XXI  5 a.'s  also  show  a slight  rise  after  1 950  not  seen  for  the 
i 

MX20  and  XX20  histories. 

Year  to  year  variations  in  the  skewness  parameter  fi.  for  the  four 
time  sequences  include  some  distinctive  features  (Figs.  8,  9); 

. a lower  rate  of  increase  in  skewness  of  the  distribution  toward 
the  younger  ages  for  all  four  sequences  before  1920,  higher  for 


u. 


37 


all  four  between  1920  and  1 935; 

a divergence  of  marital  from  overall  fertility  skewness  patterns 
between  1935  and  1950--the  continued  rise  in  positive  skewness 
for  XXl  5 and  XX20  while  skewness  for  MX15  and  MX20  fluctuates. 

What  would  these  results  mean  in  demographic  terms?  If  the  degree 
of  positive  skewness  of  the  marital  fertility  distribution  is  considered  to 
be  due  mainly  to  a decline  in  functioning  activity  states  at  higher  ages, 
the  fitted  parameter  suggests  acceleration  of  this  process  for  15  years, 
then  leveling  or  even  some  regression  for  a 15-year  period  before  resump- 
tion of  the  trend.  If  the  degree  of  positive  skewness  of  the  overall  fertility 
distribution  is  die  mainly  to  the  combined  effects  of  limitation  of  marital 
fertility  at  higher  ages  and  the  proportion  married  and/or  entering  child- 


I 


bearing  at  the  younger  ages,  the  divergence  of  overall  fi.  from  marital 

fi  has  picked  up  the  significant  increases  in  age- specific  proportions 
i 

married  which  occurred  between  1^3^  and  1950.  These  possibilities  are 
examined  below  at  several  levels  of  detail. 


F2.  Comparison  of  the  FUR  and  Coale  descriptions  of  marital  fertility 
For  a first  broad  examination  of  demographic  significance  of  the 
Fill R- derived  marital  fertility  parameters,  comparison  with  a model  whose 
parameters  are  considered  to  have  demographic  meaning  may  be  useful. 

We  shall  therefore  compare  the  parameters  of  the  EHR  model  of  marital 
fertility 

F. . = at  . A + fl  B.' 

% ij  1 .1  1 .1 

with  those  of  the  Coale  model  of  marital  fertility 

1 


r(ai  = Mn(a)e 


m • v ( a 1 

which  becomes,  with  omission  of  M,  the  core  of  the  Coale -T rus sell  model 
fertility  schedules  for  an  idealized  population. 

Each  model  of  marital  fertility  concentrates  on  the  age  pattern  of 
fertility  apart  from  level,  but 

r (a ) refers  to  the  fertility  rate  for  women  within  an  age  group 
(e.  g.  35-39),  while 

F uses  the  normalized  cumulative  distribution,  and  therefore 

ij 

refers  to  proportion  of  total  fertility  achieved  by  an  age  cut 
(e.  g.  39/40). 

Each  model  relates  its  own  expression  of  an  observed  lertility 
schedule  to  a standard  schedule: 

n(a),  an  arithmetic  mean  of  ten  of  the  schedules  identified  by 

56 

Henry  as  having  a 'natural"  marital  fertility  pattern, 

A , a cumulative  marital  fertility  pattern  extracted  from  the 

i 

fertility  history  or  other  group  of  schedules  analyzed. 

Each  model  has  a multiplier  of  its  standard  schedule: 

5 7 

. M,  a variable  scale  factor, 

. a.  . a positive-valued  near-constant  which  is  a measure  of  the 
extent  to  which  the  cumulation  of  births  expressed  in  A. 
accelerates  toward  the  median  age  of  A.  and  then  decelerates 
over  higher  ages. 

Each  expresses  deviations  from  its  standard  pattern.  n(at  or  A.  , 
in  terms  of  a standard  pattern  of  departure  with  age: 


v(a),  a logarithmic  departure  from  n(a)  above  age  24,  based  on 


44  recent  schedules  and  interpreted  to  reflect  the  age  pattern  of 

conscious  introduction  of  behavior  to  control  fertility  alter  some 

58 

desired  number  of  children  is  reached. 

B ",  specific  to  the  fertility  history  or  other  group  of  schedules 

.i 

analyzed  and  including  all  age  cuts  in  its  roughly  quadratic  pattern 

Each  model  provides  a measure  of  the  degree  of  this  deviation: 

m , interpreted  as  a measure  of  "degree  of  control  of  fertility 

at  higher  ages  and  expressed  in  the  extent  to  which  the  marital 

fertility  distribution  above  age  24  is  positively  (or  occasionally 

negatively)  skewed  according  to  the  age  pattern  of  v(a). 

0.  , the  extent  to  which  the  whole  age  distribution  of  marital 

fertility  is  skewed  according  to  an  age  pattern  determined  by  B . 

We  ask  first  whether  the  pattern  empirically  derived  for  Sweden  and 

expressed  in  a A is  a reference  standard  at  all  comparable  to  the 
i i 

tightly  clustered  family  of  patterns  referred  to  as  natural  marital  fertility 
distributions.  The  similarity  of  (11  the  normalized  age-specific  schedules 
implied  by  a A for  MX20  and  (2)  some  of  Henry's  schedules,  normalized 

i ,i 

to  concentrate  on  pattern,  is  immediately  apparent  in  the  upper  portion  of 
Table  8 (see  page  50  below)  which  relates  mainly  to  analyses  in  a later 
section.  In  fact,  a A is  more  like  some  of  the  "natural”  distributions 

i j 

than  the  latter  are  like  each  other. 

The  relation  is  clarified  when  each  of  Henry's  schedules,  in 
cumulated  normalized  form,  is  approximated  by  a weighted  sum  of  the 


iMHMMIfli  M 


' 


Swedish  A and  the  Swedish  B..  (That  is,  each  schedule  is  re-expressed 
J J 

on  the  folded  square  root  scale  used  in  the  EHR  analysis,  and  regressed 
on  the  A and  the  B found  for  MX20.  ) The  degree  of  central  concen- 

j .1 

tration,  a . ranges  from  1.083  to  1.20,  not  unlike  the  range  of  1.080  to 
i 

1.  138  for  MX20;  degree  of  skewne ss,  /3.  , ranges  from  0.0  35  to  0.218, 
compared  to  a low  of  0.  204  for  MX20.  Closeness  of  fit  of  some  of 

these  EHR  approximations  to  the  reported  "natural"  distributions  can  be 
examined  in  raw  scale  non-cumulated  form  in  Appendix  C.  The  relation 
between  natural  fertility  distributions  and  EHR-derived  age  patterns  can 
of  course  be  more  fully  examined  when  a more  general  A.  and  B.  have 
been  derived  from  a full  range  of  fertility  schedules  representing  a 
variety  of  populations  and  periods.  That  fertility  distributions  identified 
as  having  a "natural  pattern  may  be  culturally  determined  slight  varia- 
tions of  a more  general  pattern  underlying  all  fertility  distributions  is 
given  further  support  in  a later  section. 

We  next  test  whether  m and  3 appear  to  measure  the  same 
phenomenon  in  a fertility  time  sequence.  We  use  the  expression 

In  [ r (a  i/Mnl a I | 


with  the  values  of  n(al  and  v(a)  underlying  the  Coale-Trusscll  model 

r(20-24)  . , r . . 

schedules,  and  with  M = -■  - , as  in  the  Coale  model  of  marital 

n(20-24) 

fertility  alone;  and  we  calculate 

m for  the  reported  1892-1959  marital  fertility  schedules. 


m for  the  fitted  schedules  resulting  from  the  EHR  analysis  of 


this  68-year  time  sequence. 


4 1 


In  order  to  obtain  for  each  year  a single  value  of  m to  compare  with 

3 . a weighted  average  m is  calculated,  weighting  the  m at  each  age 

Ki 

62 

by  the  proportion  of  the  year's  births  occurring  to  that  age  group.  This 
emphasizes  the  shape  of  the  schedule  over  the  central  ages  of  childbea  ring , 
the  portion  ot  the  schedule  which  Coale  and  Frussell  consider  to  be  ol 
primary  importance  in  estimating  their  degree  of  control. 

Plotting  a tor  MX1  6 and  for  MX20  in  the  same  figure  with  weighted 
average  m for  fitted  schedules  and  weighted  average  m for  reported 

schedules  (Fig.  101,  demonstrates  that  averaged  m and  0.  have  the  same 

63 

pattern  of  variations  over  time  and  differ  principally  in  scale. 


F3.  Changes  in  the  marital  fert  ility  di  st  r ibution  not  accounted  for  by 


changes  in  level  of  marital  fertility 
\o  v.  that  the  demographic  significance  ot  the  EUR  description  of 
marital  fertilitv  has  been  examined  broadly,  we  shall  go  further  in  dis- 
secting the  age  distribution  of  marital  fertility  and  in  associating  change 
with  underlying  demographic  factors.  Specifically,  we  separate  from  the 
time  parameters  MXa.  and  MXj3  that  portion  of  change  which  is  not 


accounted  for  by  the  changes  in  total  rate  of  marital  fertility  (MT1  over 

64 

the  68  years  of  the  history. 

To  accomplish  this  separation,  we  use  regression  as  exclusion, 
regressing  the  MXa.  and  MX0.  for  each  time  sequence  on  the  corre- 
sponding Ml'/’"  The  regression  residuals,  referred  to  as  MXa;>MT  and 

i 

MXfl  to  denote  the  time  parameters  linearly  compensated  for  M F, 

i • MT . 

i 

are  shown  in  Fig.  11.  The  MXa  sequences  suggest  that  very  little. 


42 


1 


t 


if  any,  of  the  small  amount  of  variation  in  MX  a . can  be  accounted  for  by 
change  in  MT . (Compare  spread  and  pattern  over  time  in  the  upper  por- 
tions of  Figs.  8 and  11.)  The  regression  coefficient  of  MT  for  each  sequence 
(Table  7)  confirms  this  impression. 

In  contrast,  a significant  amount  of  the  change  in  MX/3,  can  be 
accounted  for  by  changes  in  MT.  The  relation  can  be  judged  both  from 
the  regression  coefficients  (Table  7)  and  from  comparison  of  the  lower 
portion  of  Fig.  8 (MX/3,  over  timel  with  the  lower  portion  of  Fig.  11 

I 

(MXfi.  w_  over  time). 
i • M T . 
i 

At  the  same  time,  the  portion  of  change  in  MX/3,  not  accounted  for 

by  MT  has  some  significant  features  as  expressed  in  MX/3,  . : 

i 

. the  transition  from  positive  to  negative  at  1910, 

. the  long  largely  negative  stretch  from  1911  to  1 943  (with  one 
ratable  exception,  1920), 

. the  increasingly  positive  values  after  1990. 

These  residuals  can  be  interpreted  in  the  following  way: 

0 indicates  that  change  in  MX/3,  is  proportional  to  change  in  MT 
in  the  opposite  direction  (e.g.  that  increase  in  skewness  of  the 
distribution  toward  the  younger  ages,  according  to  the  pattern 
determined  by  MXB  , parallels  the  decline  in  MT  and  could 

j 

therefore  be  entirely  accounted  for  by  such  a decline  in  func- 


tioning activity  states  at  the  higher  ages  ). 


Table  7.  Regression  Coefficients  and  Constants  in  Linear 
Compensation  of  EHR  Time  Parameters  for  (he 
Level  of  Fertility  and  for  Other  Selected  Factor: 


41 


Compensation 

of 

For 

Regress  ion 
Coefficient 

Constant 

MX  1 i Ct 

i 

MT1  5 

0.  002 

1.221 

MX20a. 

1 

MT20 

0.  004 

1.102 

MX  1 5 p 

i 

MT1  1 

- 0 . 516 

1 . 664 

MX2O0. 

i 

MT20 

-0 . 400 

0.  81  7 

XXI  - a 

i 

mxi  i a 

i 

0 • 470 

0.  892 

XX20  a 

i 

MX20a. 

1 

0.711 

0.  109 

XXI  5 8 . 

i 

M X 1 5 P 

i 

1.101 

-0.  515 

XX20  0 

i 

MX20  p. 

1 . 104 

-0.  1 93 

XXI  5 a . 

i 

XT  1 5 

0.018 

1 . 418 

XX20  a 

i 

XT20 

0 . 000 

1.182 

XXI  - p. 

1 

XT  l 5 

-0. 966 

0 . 0 5 3 

XX20  P. 

i 

XT20 

- 1 . 098 

0 . 000 

Cl  - a 

i 

CT1  5 

-0.011 

1 . 489 

ci  - p . 

CT1  5 

- 1 . 008 

0 . 9 14 

44 


4 indicates  that  change  in  MX/3.  is  greater  than  change  in  MT  in 

the  opposite  direction  (e.g.  that  skewness  toward  the  younger 

ages  according  to  the  pattern  determined  by  MXB.  shows 

greater  increase  than  corresponds  on  the  average  to  the  decline 

in  MT,  and  therefore  indicates  a disproportionate  shift  of 

functioning  activity  states  from  higher  to  lower  ages). 

- indicates  that  change  in  MX0.  is  less  than  corresponds  on  the 

average  to  change  in  MT  in  the  opposite  direction  (e.g.  that  the 

age  distribution  of  births  is  not  skewed  toward  the  younger  ages 

according  to  the  pattern  determined  by  MXB.  as  much  as  would 

have  been  predicted  from  a decline  in  MT  due  entirely  to  decrease 

in  functioning  activity  states  at  the  higher  ages  I. 

A demographic  view  of  the  observed  variations  over  time  ma\  be  that. 

From  1910  to  1043,  the  slightly  disproportionate  fraction  of 

total  fertility  accounted  for  by  women  at  higher  ages  reflects 

the  childbearing  behavior  of  cohorts  with  a high  incidence  of  late 

66 

marriage. 

. After  I960,  older  women  were  limiting  their  fertility  dispropor- 
tionately more  highly  than  younger  women,  members  of  the 
post-1936  earlier-marrying  cohorts. 

The  derived  parameter  may  represent,  then,  a separation  of  effects  of 
marriage  entry  pattern  from  the  effects  of  birth  limitation  per  se.  in  a 
cross-sectional  view  of  the  aggregate  age  distribution  of  activity  states 
tunctioning  within  marriage. 


i 

i 


•»  . . 


•v 


45 

F4.  Changes  in  the  overall  distribution  not  accounted  for  by  changes 
in  the  marital  distribution  of  functioning  activity  states 

Now  we  look  at  one  of  the  possible  dissections  of  the  age  distribution 

of  overall  fertility.  By  regression,  we  separate  from  the  time  parameters 

a and  ft  for  overall  fertility  that  portion  of  change  which  cannot  be 
i i 

accounted  for  by  change  in  the  corresponding  marital  fertility  parameters. 

Coefficients  from  regression  of  the  XXa.'s  on  the  corresponding 

MXa  's  signal  that  their  variations  over  time,  while  covering  a narrow 
i 

range,  are  quite  highly  related  (Table  7.  page  43  above).  The  post-1950 

tail  for  XXlSa  can  be  fully  accounted  for  by  MXa.  . (Compare  spread 
i 1 

and  pattern  over  time  of  XXa  . in  the  upper  portion  ol  1-  ig.  9,  and 

XXa  . in  the  upper  portion  of  Fig.  12.  ) A portion  of  the  peak  in 

i • M Xa 

1 * 

XXa  the  1940s  remains  in  (he  regression  residuals,  XXQ.  _ 

1 ' i 

however,  and  invites  further  investigation. 

Coefficients  from  t hi'  regression  of  the  XX/3.  s on  (he  corre- 
sponding MX^.  s indicate  a high  relation  between  these  parameters 
(Table  7).  At  the  same  time,  the  regression  residuals  XXfJ.  ^,^ 
show  distinctive  variations  over  time  (Fig.  12): 

. the  transition  from  • to  - at  1915, 

. the  decline  to  the  most  negative  level  of  192^-1935, 

the  sharp  rise  between  19  15  and  1945,  with  transition  to  + at 
1 940, 

. the  divergence  of  XXI  ^ . Mxj  Sjg.  {rom  XX2°^i . MX20/3.  a,1°r 


1 94  9. 


46 


These  residuals  can  be  interpreted  in  the  following  way: 

0 indicates  that  change  in  XX/3.  is  proportional  to  change  in 

MX/3,  in  the  same  direction  (e.g.  that  any’  increase  in  positive 
skewness  of  the  overall  fertility  distribution  could  be  entirely 
accounted  for  by  change  in  degree  of  positive  skewness  of  the 
marital  fertility  distribution); 

+ indicates  that  change  in  XX/3.  is  greater  than  corresponds  on 
the  average  to  change  in  MX/3,  in  the  same  direction  (e.g.  that 
the  combined  effects  of  change  in  proportion  entering  marriage 
by  age  and  the  proportion  by  age  having  an  illegitimate  child 
have  contributed  to  greater  positive  change  in  skewness  of  the 
overall  fertility  distribution  than  can  be  accounted  for  by  positive 
change  in  skewness  of  the  marital  fertility  distribution).^ 
indicates  that  change  in  XX/3.  is  less  than  corresponds  on  the 
average  to  change  in  MX/3,  in  the  same  direction  (e.g.  that 
positive  skewness  of  overall  fertility  shows  lesser  increase 
than  does  positive  skewness  of  marital  fertility,  suggesting  that 
the  proportion  of  women  married  and/or  entering  childbearing 
at  younger  ages  is  not  increasing  as  rapidly  as  the  distribution 
of  marital  fertility  is  shifting  toward  the  younger  agesi. 

A demographic  view’  of  the  variations  pictured  in  Fig.  12  may  be  that: 
To  a small  extent  before  191^,  and  to  a greater  extent  after  1940, 
the  age  distribution  of  overall  fertility  in  these  time  sequences  is 
positively  influenced  by  the  age  pattern  of  entry  into  cohabitation 


■MaiaMiMiMlMlilMaaMMHHMiafl 


— 

47 

(and,  in  the  final  years  of  the  sequences,  is  possibly  also 
influenced  by  recent  increases  in  marriage  dissolution); 

The  domination  of  the  age  distribution  of  overall  fertility  by  the 
age  distribution  of  marital  fertility,  extending  from  1 9 1 “o  to 
1940  and  at  its  greatest  between  1925  and  1 935,  declines  rapidly 
with  the  1935-1948  shift  of  the  marriage  pattern  to  younger  ages; 

After  1948,  a sustained  positive  influence  of  the  age  pattern  of 
entry  into  cohabitation  for  XXI  5,  but  not  for  XX20,  may  result 
from  increased  relative  importance  of  illegitimate  births  at 

age  15-19  when  total  fertility  rate  is  low.  ; 

An  approach  for  approximating  age-specific  proportions  of  married 
plus  cohabiting  active  women  is  developed  below,  beginning  with  level- 
compensated  distributions  of  overall  and  marital  functioning  activity  states. 

F5.  Level-compensated  distributions  of  functioning  activity  states  in 
overall  and  marital  pe  rspectives  . 

In  a preceding  section  (see  page  41  above),  we  used  regression  to 

i 

exclude  from  the  time  parameters,  MXa.  and  MX/3.  . the  effects  of 

i i 

change  in  level  of  marital  fertility  over  the  68  years.  Level-compensated 
variation  was  expressed  in  the  resulting  vector  pairs  of  regression  resid- 
uals MXa  and  MXB  . We  found  the  variation  in  MXa.  to 

i • M T . i • M T . ' 

i t 

be  highly  independent  of  change  in  MT  . and  found  MX/3,  to  have  systematic 
variations  not  associated  with  change  in  MT.  Now.  the  pairs  of  residual 
vectors  become  the  basis  for  construction  of  year-by-year  distributions 


of  functioning  activity  states,  freed  ot  association  with  change  in  the  le\el 


of  fertility. 

First,  the  level- compensated  time  vector  MXa.<MT  is  centered 

i 

on  the  constant,  k(  , from  the  regression  of  MXa.  on  MT,  so  that 

MXa  is  compensated  to  a standard  level  of  fertility.  Then  each  element 
i 

of  this  vector  (k.  + MXa.  1 is  multiplied  by  each  element  of  the  age 

I 1 • MT  . 

1 

vector  A.  , and  each  element  of  the  level-compensated  vector  MX0j.MT 
-1  i 

is  multiplied  by  each  element  of  the  age  vector  B.  to  form  the  pairs  of 

components  for  each  year  at  each  age  cut.  Summing  the  pairs  of  com- 
ponents within  each  cell  of  the  matrix  then  gives  a time  sequence  of 
standard-level-compensated  distribvitions  of  functioning  activity  states 
for  the  marital  fertility  history.  For  each  year  i and  age  cut  j,  the 
complete  level-compensated  element  is: 

m = (k+MXa.  )A'.+Mxp.  b. 

i 1 i -MT.  i i • M 1 . i 

l 1 

where  MXa  „ and  MX/3.  = MXa.  and  MX/3,  linearly 

i • MT  l • M 1 . l i 

i l 

compensated  for  MT 

kj  = constant  from  regression  of 

MXa  on  MT 
i 


We  can  construct  the  corresponding  standard-level-compensated 
distributions  for  each  overall  fertility  history  after  regression  of  the 


XXa  ’s  and  XXfi.'s  on  the  corresponding  XT’s.  For  each  year  i 

i i 

and  age  cut  j , the  overall  fertility  level-compensated  element  is 


*IJ  * <V  XX“i.XT.'A  +XXSi-XT.Bi 


49 

where  XXa.,XT  and  XX0.  >XT  XXo  . and  XX/3.  linearly 

compensated  for  XT 

k = constant  from  regression  of 

XXq  on  XT 
1 

These  expressions,  M and  X.,.  on  the  folded  square  root  scale  used  in 

i.i  i.l 

the  EHR  analysis,  are  more  useful  to  us  once  expressed  on  the  raw  fraction 
scale,  and  de- cumulated  to  give  the  standard-level-compensated  age- 
specific  overall  and  marital  fertility  distributions  X..  and  M • 

For  marital  fertility,  we  can  compare  some  ol  these  level- 
compensated  patterns  with  those  implied  by  u . A.  alone,  and  with  some 
of  those  identified  as  having  a natural " fertility  pattern.  Inspection  of 
Table  8 reveals  both  similarities  (e.g.,  Greenland  1901  - 1 P 30  and  Sweden 
1PS6;  Hutterite  1*121-1930  and  Sweden  1091)  and  some  small  but  distinctive 
differences  (e.g.,  the  lower  proportion  of  functioning  activity  states  at 
ages  29-29.  also  higher  proportion  at  ages  40-44,  in  the  earlier  Swedish 
schedules  than  in  all  others,  even  the  later  Swedishi.  We  appear  to  be 
dealing  with  a close  family  of  distributions  reflecting  culture  - specific 
slight  variations  of  a more  general  underlying  pattern--on  which,  then, 
the  pattern  of  decline  of  marital  fertility  with  age  is  imposed  with  varying 
intensity. 

For  overall  fertility,  the  level-compensated  patterns  (examples  in 
Table  9)  should  reflect  both 

• the  timing  of  entry  into  childbearing,  and 


Ja 


so 


Table  8.  EHR  Standard  Distributions 

, EHR 

Le  v e 1 

-Com  pens  at  ed 

Distributions  and 

Natural" 

Distributions  of  Marital 

Fertility , 

Selected 

Examples 

Source  of  Distribution 

Ag 

e Group 

5Ef(a 

20-24 

2 9-20 

30-  >4 

3 9-  39 

40-44 

49-49 

j-1 

EHR  Standard  a A. 

i 1 

(with  lowest  a .1-  080l 
i 

. 24  73 

.2110 

.2001  . 

.1801 

. 1284 

. 0241 

(with  highest  a.  , 1.138 
i 

. 2 34  8 

. 2212 

.2107  . 

.1961 

. 1 2 38 

. 0 1 34 

Hutterites  * * 

Marriages  1921-1930 

.2914 

. 2204 

.204  3 . 

, 1896 

.1019 

. 0279 

10.  9 

Marriages  before  1921 

. 2429 

. 2 302 

.2169 

. 1 009 

. 1 046 

.0148 

o.  8 

Canada* 

Marriages  1 700-1  730 

. 2 398 

.220  3 

.2242  , 

. 1 80Q 

. 1 070 

.0139 

10.8 

v <" 

Norway 

Marriages  1874-1876 

. 24  34 

. 2 3 36 

.2096  . 

. 1 776 

.1106 

. 02  92 

8.  1 

Greenland 

1 891 -1900 

. 246 

. 2 30 

.22  3 

. 1 80 

. 00  8 

.019 

7.  70 

1 901  - 1 930 

T a i wa  n 

(women  born  c.  1900) 

. 271 

. 220 

.208 

. 1 60 

. 1 04 

.018 

7.  84 

. 2626 

.240  3 

.2201 

. 1 802 

. 0820 

. 0058 

6.95 

Sweden 

f (A  ,13  ) 1 899 

J .i 

. 24  76 

.2180  , 

.2044  . 

180  1 

. 1 228 

.0181 

7.76 

f (A  ,13  ) 1 '>2  1 
i l 

. 24  31 

. 21 22  . 

. 2024  . 

1019 

.1289 

. 0221 

9.  90 

f (A  ,B  110  39 
j i 

. 2 372 

.200}  . 

2 02  9 . 

1 04  f, 

. 13  31 

.02  33 

3.  28 

f (A  .13  1 1991 

.1  .1 

. 29  34 

.2108  . 

20  3 9 . 

1 860 

. 110  9 

.0177 

3.  07 

ft  A .13  > 10  96 

. 2 72  3 

.2280  . 

2020  . 

1 7 90 

. 1 071 

.0147 

3.12 

.1  .1 


(ll  Data  from  Henry,  loc.  cit.  in  footnote  1 8. 

(2)  Data  from  Hansen,  H.O.  "From  Natural  to  Controlled  Fertility:  Studies 
in  Fertility  as  a Factor  of  the  Process  of  Economic  and  Social  Develop- 
ment in  Greenland  c.  1 891-1979,"  presented  at  the  IUSSP  Seminar  on 
Natural  Fertility,  Paris,  March  1077. 


! 


those  culture- specific  factors  which  determine  the  timing  of 
functioning  of  activity  states  subsequent  to  a first  birth. 

These  constructed  distributions,  freed  of  association  with  level  of 
activity  while  retaining  association  with  timing  of  activity,  become  the 
basis  of  approximations  of  age-specific  proportions  of  married  plus 
cohabiting  active  women. 


F6.  Use  of  standard-level-compensated  distributions  of  births  to 

approximate  age-specilic  proportions  of  married  plus  cohabiting 
active  women 

Actual  populations  often  depart  from  the  hypothetical  idealized  one 
in  experiencing  premarital  pregnancy  and  illegitimacy  at  significant  levels. 
Social  customs  of  the  time  and  place  determine  the  size  and  age  composi- 
tion of  the  non-marriod  group  exposed  to  pregnancy,  and.  for  members 
of  this  g roup,  strongly  influence  the  outcome  of  pregnancy,  in  terms  of 
abortion,  or  an  illegitimate  birth,  or  a conception  legitimated  by  marriage 
before  the  birth.  With  a given  proportion  of  unmarried  women  exposed 
to  pregnancy,  quite  different  age  distributions  of  marital  fertility  can 
result,  depending  on: 

. the  separation  by  age  into  those  marrying  and  those  not  marrying 
before  giving  birth  to  a non-mar itally  conceived  child,  and 
the  relation,  in  magnitude  and  age  distribution,  between  the  pre- 
maritally  conceived  legitimate  births  and  post -maritally  conceived 


births. 


Table  0.  EHR  Level-Compensated  Distributions  of 

Cross-Sectional  Overall  Fertility,  Selected  Years 


Source  of  Distribution 


20-24 

25-29 

EHR 

Standard  a A 
i 1 

. 1 044 

.2136 

f(4 

B.  ) 
.1 

1 895 

. 1 020 

.2118 

f ( A . 

.1 

B.  1 
.! 

1921 

. 0988 

.20  34 

f ( A . , 
.1 

B.  1 
.1 

193  5 

. 0771 

. 1 903 

ft  A . , 
j 

B ) 
J 

1 951 

. 1471 

. 2390 

f (A . , 

B.  ) 

1956 

.1701 

.2=336 

Age  Group 

7 


30  - 34 

35-  39 

40-44 

45-49 

5 E f 

. 2571 

. 2471 

. 1 556 

. 0222 

jl 

. 2572 

. 2489 

. 1 570 

. 0226 

4.  07 

. 2531 

.2513 

.1656 

. 0280 

2.  83 

. 2560 

. 2673 

.1811 

. 0282 

1 . 61 

.2517 

.2170 

.1259 

.0183 

2.  1 1 

. 2194 

. 2027 

. 1 096 

. 0147 

2.  00 

Resulting  problems  for  aggregate  fertility  models  can  be  severe,  both  in 
description  of  the  age  distribution  of  marital  fertility  and  in  expression 


53 


of  age- specific  proportions  of  cohabiting  women.  Age  15-19  births,  or 
births  at  marriage  durations  of  less  than  two  years,  are  often  omitted  to 
lessen  or  avoid  the  effects  of  non-mar itally  conceived  births  on  model 
descriptions  of  fertility.  The  demographic  importance  of  both  the  timing 
of  first  birth  and  the  length  of  the  first  interbirth  interval  argues  strongly, 
however,  for  seeking  ways  to  include  in  analysis  these  portions  of  aggre- 
gate data  which  cover  the  beginning  of  childbearing  for  a significant  pro- 
portion of  women  in  many  populations. 

We  have  already  seen  that  EHR  analysis  can  provide  excellent  fits 
to  very  diverse  distributions,  including  marital  fertility  distributions 
highly  influenced  by  premarital  pregnancy.  Further  exploration  suggests 
informative  ways  of  dealing,  in  the  aggregate,  with  those  functioning 
activity  states  which  lead  to  illegitimate  births  rather  than  to  precipitated 
marriages.  At  this  stage,  we  use  level-compensated  distributions  of 
functioning  activity  states  as  the  basis  for  approximating  age- specific 
proportions  of  married  plus  cohabiting  active  women  (referred  to  as  MPA 
women  for  brevity).  That  is,  of  the  possible  combinations  of  exposure 
and  activity  status: 


Married  Not  married 


Not 


F unctionally 
active 


but  cohabiting  cohabiting 


X 


F unctionally 
inactive 


54 


! 


» 


i 


we  approximate  the  age-specific  proportions  of  (1  +2  + 3).  We  illustrate 
the  approach  with  the  conventional  overall  and  marital  fertility  sequences 
analyzed  above  (and  then  suggest  other  pairs  of  sequences  whose  analysis 
may  further  aid  understanding  of  the  phenomenon). 

The  total  rates  of  overall  births  and  legitimate  births  for  each  year 
are  dispersed  across  the  corresponding  level-compensated  patterns  x! 
and  M'  derived  for  that  year.  This  provides,  first  of  all,  a level- 

ij 

compensated  overall  fertility  rate  and  a level-compensated  marital  fertility 
rate  for  each  age  group  in  each  year.  An  approximation  of  the  proportion 
of  MPA  women  in  year  i and  age  group  j is  then  calculated  as 

x! . ( X T ) ( 5 ) 

p - — U 

ii  m:.(mt)(5i 
ij 

where  (XT)(5)  total  rate  of  overall  fertility  expressed  as 

mean  number  of  children  per  woman  over  the 
reproductive  span  of  a synthetic  cohort 
(MT)(5)  = total  rate  of  marital  fertility  expressed  as 

mean  number  of  children  per  married  woman 
over  the  reproductive  span  of  a doubly  synthetic 
cohort 

Approximations  based  on  XX20  and  MX20  parameters  are  shown  in 
Fig.  13  for  three  age  groups  which  have  experienced  striking  alteration 
of  marriage  pattern  over  the  68  years  of  the  time  sequence  analyzed.  The 
significant  change  in  age-specific  proportions  married,  beginning  in  19  35, 
and  the  deceleration  in  change  after  1945  are  clearly  picked  up  by  the 





ii 


MUM  flBtfi 


procedure . 


The  approximations  can  be  improved,  however,  by  EHR  analysis 
of  at  least  two  additional  sequences: 

a legitimate  fertility  sequence  (calculated  from  legitimate 
births  by  age  of  mother,  and  the  total  female  population  in  each 
age  groupi  to  pair  with  marital  fertility;  and 

a married  or  actively  cohabiting'  fertility  sequence  (which 
simply  includes  all  non-marital  functioning  activity  states  on  the 
same  basis,  whether  or  not  they  precipitate  marriage  before 
the  birth  of  a non-mar itally  conceived  child  i to  pair  with  overall 
fe  rtil  ity . 

The  immediate  goal  in  refining  and  extending  the  analysis  is  to  develop 
as  concise  and  useful  a description  as  the  data  will  allow  oi  a population  s 
experience  of  childbearing,  within,  outside  of,  and  influencing  legal  mar- 
riage. Taking  advantage  of  the  richness  of  the  data  for  this  one  population 
and  of  the  strengths  of  the  EHR  approach  to  the  data,  the  larger  goals  are 
at  least  two  : 

to  expand  understanding  of  the  outcome  of  pregnancy 

in  the  early  years  of  childbearing  in  the  context  of  a particular 

society  changing  over  time; 

. to  tease  out  more  information  on  the  age-specific  proportions 
cohabiting  which  are  apt  to  underlie  the  observed  fertility  rates 
for  the  early  portion  of  the  reproductive  span  within  the  context 
of  a population's  complete  age  pattern  of  childbearing. 


Changes  across  time  in  cohort  and  cross-sectional  overall  fertility 


56 

G. 

patterns  ( selected  observations  on  the  full  1 775-  1 959  fertility 
hi  stor  ie  s ) 

Demographic  implications  of  the  EHR  components  have  been  examined 
at  several  depths,  using  the  data  of  extraordinarily  high  quality  for  the 
68-year  cross-sectional  overall  and  marital  fertility  sequences.  Now  we 
turn  to  the  185-year  overall  fertility  history  in  cross-sectional  and  cohort 
perspectives.  The  analyses  of  the  XI  5,  X20,  Cl  5 and  C20  time  sequences 
provide  several  unusual  opportunities: 

to  examine  in  EHR  model  terms,  the  relation  between  changes 
in  cohort  and  cross-sectional  distributions  of  functioning 
activity  states  over  an  extended  period; 

. to  discover  how  much  can  be  deduced  about  change,  trom  EHR 
analysis  of  overall  fertility  alone; 

. to  see  how  EHR  analysis  handles  data  of  somewhat  lesser 
accuracy  in  combination  with  data  of  very  high  quality. 

Gl  . Comparability  of  cohort  and  cross-sectional  EHR  components 

A fertility  model  may  reasonably  be  questioned  on  its  relative  capacity 
to  handle  cross-sectional  and  cohort  data.  Each  of  these  perspectives  on 
fertility  distributions  provides  its  own  challenge  to  appropriate  description. 

A cross-sectional  slice  is  the  composite  of  a small  portion  of  the  child- 
bearing experience  of  each  of  many  cohorts  whose  diverse  histories  influence 
their  behavior  at  any  given  period.  The  period  in  which  the  cross-sectional 
slice  is  taken  then  has  its  own  influence  on  the  experience  of  all  of  the 


57 


cohorts  of  women  passing  through  the  period  at  various  ages. 

For  the  fertility  rate  matrix,  submitted  to  EHR  analysis  in  cross- 

70 

section  (by  year)  and  on  the  diagonal  (by  cohort  I,  three  of  the  possible 
outcomes  are  that 

. the  underlying  similarities  of  the  data  in  cohort  and  cross- 
sectional  forms  may  dominate  the  fitted  parameters,  leaving 
in  the  residuals  most  of  the  divergence  of  the  two; 
the  systematic  differences  may  be  well  captured  in  the  separate 
sets  of  fitted  parameters  for  cohort  and  cross-sectional  sequences; 
either  "true"  similarities  or  "true"  differences  between  cohort 
and  cross-sectional  underlying  regularities  may  be  obscured  by 
diffusion  through  fitted  parameters  and  residuals  for  one  or  both 
sequences . 

We  found  above  (see  pages  12  to  25  ) that  the  present  EHR  analysis 

. has  given  the  overall  fertility  sequences  close  fitted  descrip- 
tions in  cohort  perspective,  and  still  closer  fitted  descriptions 
in  cross-sectional  perspective; 

has  left  a different  type  of  regularity  in  the  small  residuals 
from  the  cross-sectional  analysis  than  in  those  from  the  cohort 
analysis. 

Looking  for  systematic  differences  in  the  cohort  and  cross-sectional 
fitted  parameters  is  more  profitable  after  fine-tuning  the  fits.  Here, 
we  concentrate  on  the  similarities  of  the  pairs  of  components  a.  A.  an<^ 

S B so  far  derived  using  the  folded  square  root  re-expression.  We  give 

i j 


58 


i 


t 


particular  attention  to  the  XI  5 and  Cl  5 time  histories,  and  consider  those 
for  XZO  and  CZO  in  less  detail. 

The  components  a.A^  and  S^Bj  underlying  the  X15  sequence  have 
already  been  examined  above  (see  pages  Z9  to  Ui  to  establish  the  function 
of  each  component  in  describing  a fertility  time  history.  Here  we  construct 
the  same  model  distributions  for  the  Cl  5 sequence  so  that  Fable  10  can  be 
compared  directly  with  Table  4,  and  Table  1 1 with  Table 

We  see  that  a A has  a slightly  wider  range  for  Cl  5 than  for  XI  5, 
i i 

and  B B has  a slightly  lower  and  narrower  range  for  Cl  5 than  for  XI  5 

i j 

(Tables  10  and  4).  When  we  turn  to  Tables  11  and  5.  however,  we  find 

pronounced  similarity  for  XI  5 and  Cl  5 in: 

the  raw  scale  schedules  defined  by  a given  level  of  a.  and 
B , and  in 

the  magnitude  and  direction  of  change  in  the  distribution  as 
each  component  is  varied  in  even  increments  while  the  other  is 
held  constant. 

This  means  that  change  in  these  two  histories  over  time  can  reasonably 

7Z 

be  compared  below  by  means  of  these  EHR  time  parameters. 

GZ.  Some  intersections  of  cohort  and  cross-sectional  age  distributions 
of  functioning  activity  states,  viewed  through  EHR  components 
Before  superimposing  the  cohort  and  cross-sectional  time  parameters, 
we  look  at  the  parameters  for  each  sequence  individually,  to  note  the 


u ; ~ 


59 


I able  10.  Range  ot  Contribution  of  Components  (x  A and  B B 

i j i j 

to  the  Fitted  Description  of  the  CIS  Sequence,  Folded 
Square  Root  Scale 

Time  Parameter  Component  Age  Cut 


a 

i 

a A . 

i .i 

19/20 

24/ 2 S 

29/40 

34/3  5 

39/4  0 

44/4  5 

I X)  w 

1 . 496 

8 42  S 

- . S424 

1 866 

. 1 699 

. 5079 

. 7994 

Median 

1. 480 

- . 8840 

- . S 7 S 4 

1 979 

. 1 802 

. 5387 

. 8479 

High 

1 . S44 

0209 

- . 6000 

- . 2064 

. 1 879 

. 561  8 

. 8843 

(3 

i 

(3  B. 

1 .1 

Lo  \v 

. 0624 

0126 

- . 0298 

0 450 

-.0413 

-.0244 

- . 01 00 

High 

. 6 427 

. 1282 

. 4019 

. 354  5 

. 41  77 

.2362 

.1018 

k k 

* * * 

k k k k 

* * * 

k k k k 

* * 

Table  11.  Age-Specific  Fertility  Distributions  (raw  scale!  Based 

on  the  Age  Parameters  A and  B for  Cl''  and  Increments 

j .1 

in  the  Time  Parameters  Oc  and  8. 

i l 


a 

i 

1.47 

1 .47 

1.47 

1 .47 

1 . 42 

1 . 42 

Change  by  0.  1 

0 

i 

Change  by  0.  1 

0 

0.  1 

0.  05 

0.0  5 

0.0  5 

0.  05 

Age 

Group 

Change  in  flat 

f ( a ) 

flat 

f (a ) 

f ( a ) 

f ( a 1 

Change  in  f(a 

15-19 

. 0046 

.0145 

.0181 

.0157 

. 02  40 

. 009  7 

-.0143 

20-24 

.0211 

.1169 

. 1 480 

. 1 274 

. 1 407 

.12  41 

-. 0076 

25-29 

.0141 

.2  419 

. 24  50 

. 2 486 

. 2 427 

. 244  4 

.0116 

30-  44 

-.  004  3 

.2641 

. 2 588 

.2611 

. 252  4 

. 2699 

. 01  76 

35-  40 

-.0145 

. 2247 

.2102 

.2175 

.2118 

. 2220 

.0111 

40-44 

-.0159 

. 1282 

.1124 

. 1 202 

. 1 224 

.1170 

- . 0054 

45-49 

-. 0049 

.0216 

. 01  77 

.0196 

. 02  72 

. 01  32 

-.0140 

distinctive  features  (Figs.  14,15). 

The  pairs  of  a.  's  (for  XI  5 and  X20,  for  Cl  5 and  C20l  differ  prin- 
cipally in  level.  The  narrow  range  of  departures  of  a from  constancy 
is  similar  in  all  of  the  histories  (except  for  some  of  the  post- 1914  cohorts).' 
The  pairs  of  $.'s  (for  XI  5 and  X20,  for  Cl  5 and  C20)  also  differ  prin- 
cipally in  level. 

Two  periods  of  transition  are  suggested  for  XI  5 and  X20,  three  for 
C 1 5 and  C2 0 : 

. a prominent  increase  in  positive  skewness  of  the  distribution 
at  188  5-1895  for  XI  5 and  X20  (F  ig.  14  , lower  sectionl  and  for 
those  cohorts  starting  their  childbearing  at  ages  15-19  about 
1860-1865  and  therefore  in  the  latter  half  of  their  childbearing 
years  in  1 885-1895  (Fig.  15,  lower  section)  --suggesting 
parity  dependent  limitation  of  births; 

a turn  towa  rd  negat  ive  skewne  ss  at  1855-1  84  5 for  X15  and  X20 
(F'ig.  14,  lower  section)  and  for  those  cohorts  starting  their 
childbearing  at  ages  15-19  in  1 830-1  840  and  therefore  aged 
20-  35  in  1 835-  1 845  (F'ig.  15,  lower  section )-- sugge sting  increase 
in  age  of  marriage  around  1840; 

. an  abrupt  change  in  cohort  behavior  beginning  with  those  cohorts 
aged  15-19  after  1914  (as  judged  by  both  a.  and  f$.  ). 

This  last  transition  has  already  been  signaled  in  the  cohort  residuals 
(F'ig.  6).  That  these  variations  may  reflect  new  patterns  in  the  cohort 
age  distribution  of  functioning  activity  states  is  suggested  by  the  combination 


61 

of  stability  in  post- 1 91  5-cohort  total  rate  of  fertility  after  the  earlier 
fairly  steady  decline,  and  the  rapid  evolution  which  these  cohorts  expe- 
rienced in  age  pattern  of  first  marriage,  in  striking  departure  from  the 
cohorts  aged  15-19  before  191  3.'"’  (That  specifying  a different  combination 
of  data  re-expression  and  fitting  may  better  accommodate  such  variations 
has  already  been  suggested. ) 

In  the  context  of  the  full  185-year  sequence  of  a 1 s and  0.s  for 
the  XI  5 and  X20  histories,  the  periods  1910-1920  and  the  period  around 
1040  might  be  thought  small  aberations  in  an  ongoing  trend.  One  will 
recall,  however,  that  these  periods  were  identified  in  earlier  portions  of 
the  analysis  as  transitions,  appearing  to  involve  both  change  in  marital 
fertility  patterns  and  change  in  the  age  pattern  of  entry  into  cohabitation. 
(Recall  F ig.  1.  evidence  as  early  as  the  EUR  fitting;  also  Fig.  11  on 
linear  compensation  of  the  marital  fertility  sequence  parameters  for  total 
rate  of  fertility  and  F'ig.  12  on  linear  compensation  of  the  overall  fertility 
sequence  parameters  for  marital  fertility.  ) 

F or  a more  vivid  impression  of  some  intersections  of  cohort  and 
cross-sectional  experience,  we  superimpose  the  pairs  of  time  vectors. 

For  example,  with  C1^0.  centered  on  year  at  age  30-  34  (Fig.  16Ai.  one 
sees  that,  before  1908.  the  extent  to  which  the  cross-sectional  age  distribu- 
tion of  functioning  activity  states  was  changing  in  degree  and  direction  of 
skewness  was  very  similar  to  the  extent  to  which  cohorts  then  at  the 
central  ages  of  childbearing  were  shifting  their  age  pattern  of  activity  to 
younger  or  older  ages.  The  patterns  over  time  then  diverge  sharply.  If 


u 


time  line--(Fig.  16B  shows  a shift  of  12  years,  so  that  these  cohort  values 


are  then  superimposed  on  those  for  XI  5£  for  the  years  1920  and  after)-- 
the  patterns  of  change  in  the  two  skewness  vectors  are  again  extraordi- 
narily similar  for  about  25  years  before  parting  abruptly  once  more  about 
1 945. 


We  learn  more  about  the  relation  of  these  patterns  by  a further 
dissection  of  the  cohort  parameter  /3.  . By  LS  regression  of  Cl  5/3.  on 
CT,  we  separate  into  the  residuals  that  portion  of  change  in  j9.  not 
associated  with  change  in  level  of  fertility  (Fig.  17).  (The  regression 
coefficient  and  constant  are  included  in  Table  7,  page  4 3.  t The  cohorts 
with  positive  residuals  are  those  whose  distribution  of  functioning  activity 
states  is  skewed  more  toward  the  younger  ages  than  would  correspond  on 
the  average  to  the  level  of  fertility;  the  cohorts  with  negative  residuals 
are  those  with  later  age  distributions  of  activity  than  would  correspond 
on  the  average  to  the  level. 

If  we  direct  our  attention  to  the  cohorts  aged  15-19  in  1892-1917  (and  aged 
30-  34  in  1007-1932),  we  see  that  the  divergence  of  cohort  from  cross-sectional 
pattern  over  time  in  Fig.  16A  begins  precisely  with  those  cohorts  which 
adopted  an  earlier  than  average  age  pattern  of  level -compensated  activity. 

The  almo  st  - super  i mpo  sable  behavior  for  25  years  in  Fig.  16B  includes  all  of 
these  cohorts  with  the  earlier  pattern,  and  the  divergence  at  1945  is  pre- 
cisely with  those  cohorts  which  reverted  to  a later-than-usual  age  pattern  of 
level-compensated  activity.  (One  may  note  that  for  the  cohorts  covered  by 


63 

Fig.  16B,  total  rate  of  fertility,  expressed  as  mean  number  of  children  per 
woman,  dropped  from  3.  50  to  1.  88,  while  cross-sectional  total  rate  in  the 
years  1920-1045  dropped  from  3.23  to  1.  70  and  rose  again  to  2.63.  I 

Comparison  of  the  cohort  and  cross-sectional  sequences  of  full 
level-compensated  distributions  of  functioning  activity  states  (as  for  the 
68-year  marital  and  overall  fertility  sequences,  pages  4i  to  51  above  I 
is  reserved  for  a later  discussion  of  the  parameters  resulting  from  fine- 
tuning  the  fits. 

G3.  Cohort  vs.  cross-sectional  evidence  for  data  points  of  lesser 
accuracy 

A major  benefit  of  EHR  analysis  of  fertility  distributions  was  expected 
to  be  the  capacity  to  extract  the  underlying  regularities  while  revealing  the 
points  or  periods  of  departure  from  the  trend  or  the  usual  pattern.  We 
have  concentrated  in  preceding  sections  on  the  regularities  and  their 
further  dissection,  in  order  to  learn  more  about  the  dynamics  of  change 
over  the  time  sequences.  Here  we  open  the  question  of  the  handling  of 
data  of  lesser  accuracy. 

Some  errors,  particularly  in  the  population  data  before  about  1840, 
are  thought  to  exist  in  the  yearly  data  from  Cirunddragen  used  to  calculate 
the  pre-1875  age-specific  fertility  distributions.^’  (The  possibility  of 
other  common  types  of  error  --omissions  of  births,  misplacement  of 
births  in  time,  misplacement  of  women  by  age--must  also  be  considered.) 
From  the  EHR  analysis,  what  general  evidence  of  such  errors  is  there? 

The  EHR-derived  time  parameters  for  XI  5 and  X20  do  not  exhibit 


much  year-to-year  irregularity  (Fig.  14).  The  only  singular  departures 


are  in  B for  1 783  and  1792.  (Fig.  5 shows  these  two  years  to  be  aberrant 
i 

in  the  residuals  also.  ) ‘ In  contrast,  the  total  rate  of  fertility  fluctuates 
rather  widely  from  year  to  year  before  1 870.  It  appears  that  the  age 
distribution  of  births  in  cross-section  may  be  fairly  accurately  represented 
by  the  recorded  data,  even  though  the  level  of  fertility  may  be  variously 
in  error  for  the  early  years. 

Cohort  histories  constructed  from  the  cross-sectional  age-specific 
rates  would  reflect  any  year- specific  errors  in  level  in  two  ways: 

The  less  accurate  rates  for  one  year  would  be  disseminated 
across  seven  cohorts  at  one  five-year  age  group  in  each  cohort. 
Thus  the  age  distribution  of  births  in  each  of  the  cohorts  would 
be  distorted  in  a different  way. 

. The  cohort  total  rate,  the  sum  of  the  more  or  less  accurate 
age-specific  rates  across  the  seven  age  groups  for  a cohort, 
would  serve  to  average  out  errors  in  cross-sectional  total  rate. 
Cohort  total  rate  should  thus  follow  a smoother  course  than 
does  cross-sectional  total  rate. 

The  picture  given  by  the  EHR  cohort  analysis  is  consistent  with  such  a 
dissemination  of  cross-sectional  error.  In  conjunction  with  the  five-year 
lagged  pattern  in  the  residuals  (Fig.  6)  jumpiness  is  evident  in  both  a. 
and  B at  about  five-year  intervals  over  the  early  portion  of  the  Cl  3 and 
C20  histories  (Fig.  15).  At  the  same  time  cohort  total  rate  is  much  less 
irregular  than  cross-sectional  total  rate. 


1 

65 

This  outcome  does  not,  of  course,  prove  error,  since  circumstances 
which  actually  altered  fertility  rates  in  a period  would  have  a similar 
impact  on  aggregate  data.  If  volition  is  significant  in  determining  when 
potential  activity  states  are  functioning,  cohort-to-cohort  variation  in  the 
timing  of  births  over  the  reproductive  span  would  lead  to  irregularity  in 
cross-sectional  total  rate  even  when  differences  between  cohorts  in  average 
number  of  births  per  woman  is  small.  The  lagged  pattern  seen  earlier 
in  the  cohort  residuals  (Fig.  6)  may,  then,  have  different  causes(or  a com- 
bination of  causes,  in  different  portions  of  this  time  history.  One  approach, 
in  this  instance,  may  be  to  work  backward  s -- reconst  ruct  ing  cross-sectional 
rates  from  cohort  rates  after  fine-tuning  the  EHR  cohort  fits.  Broad  under- 
standing of  fertility  data  and  knowledge  of  the  population's  social  history 
can  be  important  aids  in  exploring  choices  in  such  a process. 

H.  Conclusions 

Empirical  higher  rank  (EHR)  analysis  proves  to  be  a powerful  means 
of  extracting  the  patterns  which  unify  a long  and  varied  time  series  of 
age-specific  fertility  schedules.  Analyses  of  data  in  single-year  time 
sequence  demonstrate  that  dynamics  of  change  as  well  as  variety  of 
pattern  can  be  captured  in  the  fitted  descr  iptions  . 

By  emphasizing  the  centrality  of  examination  of  residuals  in  achieving 
optimal  fits  and  in  interpretation  of  the  fitted  descriptions,  the  robust/ 
resistant  and  data-guided  analyses  reported  here 

. provide  unusually  close  fitted  descriptions  of  the  diverse  age 

distributions  of  overall  and  marital  fertility  in  both  cross-sectional 


66 


and  cohort  perspectives; 

. take  some  major  steps  in  reducing  the  variability  in  the  fertility 
data  to  a concise  and  coherent  demographic  picture  which  differs 
in  important  ways  from  the  descriptions  other  aggregate  fertility 
models  have  provided,  while  having  some  significant  relations  to 
other  models; 

suggest  ways  of  refining  still  further  the  fitted  descriptions  to 
provide  additional  insight  into  the  underlying  structure  of  aggre- 
gate fertility  distributions,  and  ways  of  identifying  and  dealing 
with  error-ridden  data. 

Deniog  r aphically  guided  choice  of  a standard  form  in  which  to  use  the 
fitted  descriptions  so  far  developed  leads  to  separation  of  the  fertility 
distributions  in  each  time  sequence  (overall  or  maritall  into  three  com- 
ponents : 

. a nearly-fixed  pattern  of  cumulation  of  births  with  age,  on  which 
are  imposed  the  major  variations  in  the  distribution,  due  to 
change  in  level  of  fertility  or  to  the  timing  of  births; 

. a component  which  comprises  the  association  of  change  in  the  level 
of  fertility  with  change  in  its  age  distribution; 

. a component  which  encompasses  effects  of  the  timing  of  births, 
apart  from  level,  on  the  age  distribution  of  fertility. 

This  separation  proves  to  be  an  effective  one  in  efforts  to  discern  in  the 
aggregate  the  relative  contributions  of  the  age-specific  proportions  of 
women  cohabiting  and  the  age  patterns  of  childbearing  of  cohabiting  women. 


67 


We  present  here  not  a "finished"  model  but  informative  and  provoca- 

H 

tive  steps  in  the  continuing  search  for  useful  and  more  refined  ways  of 
looking  at  the  full  diversity  of  fertility  patterns  in  changing  social  milieux. 

The  success  in  developing  a sound  description  of  a long  and  varied  fertility 
history  encourages  a full  exploratory  analysis  with  extension  of  this 
EHR-based  work  to  cross-population  comparisons  of  fertility  distributions. 


FOOTNOTES 


The  work  on  which  this  paper  is  based  was  begun  at  the  Office  of  Pop- 
ulation Research  and  continued  in  the  Department  of  Statistics  at 
Princeton  University.  I received  useful  criticisms  from  Ansley  J. 
Coale,  Norman  B.  Ryder,  and  Barbara  A.  Anderson  at  various  stages 
of  the  work,  and  Donald  R.  McNeil  gave  generously  of  technical  advice 
during  the  early  stages  of  the  analysis.  1 am  particularly  indebted  to 
John  W.  Tukey  for  his  advice  and  sustained  interest. 


Some  notable  examples  are  found  in  the  "Brass  methods'  (brought 
together  in  Brass,  W.,  1975,  Methods  for  Estimating  Fertility  and 
Mortality  from  Limited  and  Defective  Data,  An  Occasional  Public  ation, 
Chapel  Hill:  University  of  North  Carolina,  International  Program  of 
Laboratories  for  Population  Statistics);  and  in  the  Coale  indices 
(Coale.  A . J . , 1967,  "Factor  s associated  with  the  development  of  low 
fertility:  an  historic  summary,"  in  New  York:  United  Nations, 

World  Population  Conference,  2.  pp.  205-209)  subsequently  used  in 
a series  of  monographs  on  the  decline  of  fertility  in  Europe  (Coale. 

A.  J.,  Anderson,  B.A..  and  Harm,  E.;  Knodel , J.  E.  ; Le  sthaeghe , R . ; 
Livi-Bacci,  M.  ; Van  de  Walle.  E.,  Princeton:  Princeton  University 
Press;  Forrest,  J.D.,  Ph.  D.  dissertation,  Princeton  University.  ) 

An  example  of  productive  reappraisal  of  admittedly  flawed  data,  using 
new  t ec  hnique  s,  is  found  in  Barcl  ay , G.W.,  Coale,  A.J.,  Stoto,  M.A., 
and  Trussell,  T.  J.,  1976,  "A  reassessment  of  the  demography  of 
traditional  rural  China,"  Population  Index,  42,  pp.  606-635. 

See  Keyfitz,  N.,  1977,  Introduction  to  the  Mathematics  of  Population, 
with  revisions,  Reading,  Mass  .:  Addi  son- Wesley . pp. 140-169.  and 
Brass,  W-,  1974,  "Perspectives  in  population  prediction:  Illustrated 
by  the  statistics  of  England  and  Wales,"  Jour.  Royal  Statist.  Soc.  A 
137,  pp.  532-583,  for  discussions  of  the  most  widely  studied  functions 
and  the  extent  to  which  they  fall  short  of  describing  a range  of  fertility 
distributions  accurately. 

Coale,  A.  J.,  1971  , "Age  pattern  of  marriage."  Population  Studies 
25,  pp. 193-214. 

Coale,  A . J . and  McNeil,  D.R.,  1972,  "The  distribution  by  age  of  the 
frequency  of  first  marriage  in  a female  cohort,  " J.  Amer.  Stat. 
Assoc.  67,  pp. 743-749. 

Coale,  A . J . and  Trussell,  T.J.,  1974,  "Model  fertility  schedules: 
variations  in  the  age  structure  of  childbearing  in  human  populations  ." 
Population  Index  40,  pp.  1 85-2  58. 


Since  a given  population  may  depart,  to  a greater  or  lesser  degree, 
from  the  assumptions  and  external  standards  on  which  the  model 
schedules  are  based,  the  model  parameters  may  not,  of  course, 
retain  precise  demographic  meaning  in  fitting  actual  fertility  sched- 
ules. Coale  and  Trussell  report  that  the  discrepancy  is  especially 
pronounced  when  marriage  or  childbearing  patterns  are  changing 
(loc.  cit.  in  footnote  4,  p.  193). 

McNeil,  D.R.,  and  Tukey,  J.W.,  1975,  Higher-order  diagnosis  of 
two-way  tables,  illustrated  on  two  sets  of  demographic  empirical 
distributions,"  Biometrics  31,  pp. 487-510. 

Tukey,  J.W.,  1977,  Exploratory  Data  Analysis,  Reading,  Mass.: 

Addi  son-Wesley. 

See  Mosteller,  F..  and  Tukey,  J.W.,  1977,  Data  Analysis  and 
Regression.  Reading,  Mass.:  Addi  son- Wesley , particularly 
pp.  351  - 358.  for  discussion  of  the  desirable  properties  of  the  bi- 
weight in  such  procedures. 

Single-year  schedules  were  preferred  because  of  the  belief  that  the 
variability  in  data  at  this  level  of  detail  contains  useful  demographic 
information  not  captured  in  ten-year  or  five-year  averages  even 
when  systematic  long-term  changes  over  time  are  gradual. 

For  a concise  description  of  the  sources  of  Swedish  population 
statistics  from  the  earliest  times,  and  for  a discussion  of  the  quality 
of  the  data  and  the  adjustments  that  have  been  made  to  early  data, 
see  Hofsten,  E.  and  Lundstrom,  H.,  1976,  Swedish  Population 
History,  Stockholm:  National  Central  Bureau  of  Statistics. 

Although  1814  was  the  last  year  in  which  Sweden  was  actively  engaged 
in  a war.  subsequent  conflicts  have  affected  the  country  to  a greater 
or  lesser  degree.  (For  example,  the  possible  effects  of  World  War  II 
on  Swedish  fertility  are  considered  in  Hyrenius,  H.,  1946,  "The 
relation  between  birth  rates  and  economic  activity  in  Sweden  1920- 
1944."  Bulletin  of  the  Oxford  University  Institute  of  Statistics  8, 
pp . 14-21. I 

The  extensive  records  of  economic  and  social  variables  also  encour- 
age later  tests  of  the  value  of  a derived  fertility  model's  parameters 
in  substantive  research  on  economic  and  social  change. 

Sweden,  1878,  Grundd  rag  af  Sveriges  Be  folkning  s - St  atl  st  ik  for  aren 
1748-187^,  Stockholm:  National  Central  Bureau  of  Statistics. 


Sweden,  1875-1910,  Sveriges  Officiella  Statistik;  Befolknings- 
Statistik.  Stockholm:  National  Central  Bureau  of  Statistics. 


70 


Sweden,  1911-1959,  Sveriges  Offieiella  Statistik:  Befolknings rorelsen , 
Stockholm:  National  Central  Bureau  of  Statistics. 

Data  for  single  years  through  1875,  as  adjusted  for  obvious 
omissions  and  published  by  the  Bureau  in  the  single  appendix 
Grunddragen,  were  preferred  to  Sundbarg's  later  more  extensively 
revised  figures  by  five-year  periods  up  to  1860.  Even  with  some 
errors,  a larger  number  of  data  points  have  advantages  over  aggre- 
gated data  for  the  exploratory  type  of  analysis  proposed  here. 

The  marital  fertility  history  covers  that  portion  of  the  overall 
fertility  history  for  which  recorded  data  allow  calculation  of  yearly 
age-specific  marital  fertility  rates.  Before  1892,  decennial  reports 
of  population  by  age  and  marital  status  combined  are  available, 
beginning  with  the  census  of  1870.  Reporting  of  confinements  by  age 
of  mother  and  legitimacy  of  birth  combined  began  in  1868. 

The  series  was  stopped  at  1959  because  of  the  wish  to  include  as 
full  a variety  of  age  patterns  of  childbearing  as  would  be  consistent 
with  related  analysis  of  overall  and  marital  fertility,  but  to  stop 
short  of  the  recent  increased  dissociation  of  childbearing  from 
marriage.  There  w'ill  be  value  now  in  extending  the  series  to  see 
how  new  cohabitation  and  marriage  patterns  are  influencing  the  age 
distributions  of  overall  and  marital  fertility  as  seen  through  EHR 
par  ameter  s . 

14.  Social  and  political  factors  contributing  to  eighteenth  and  early  nine- 
teenth century  marriage  patterns  are  discussed  in  Utterstrom,  G., 
1962,  "Labour  policy  and  population  thought  in  eighteenth  century 
Sweden,"  Scandinavian  Economic  History  Review  10,  pp.  262-279. 

A view  of  Swedish  marriage  changes  in  terms  of  proportion  of  years 
between  ages  15  and  50  lived  in  the  married  state  by  a birth  cohort 
of  women,  1751-1901,  will  be  found  in  Ryder,  N.B.,  "The  influence 
of  declining  mortality  on  Swedish  reproductivity,"  Current  Research 
in  Human  Fertility,  Proceedings  of  a round  table  at  the  19^4  annual 
conference,  Milbank  Memorial  Fund,  pp.  65-81.  An  analysis  of 
single-year  marriage  entry  patterns  of  post-1850  birth  cohorts  will 
be  found  in  Ewbank,  D.  C.,  1074.  An  Examination  of  Several  Applica- 
tions of  the  Standard  Pattern  of  Age  at  First  Marriage,  Ph.D.  dis- 
sertation, Princeton  University. 

15.  For  a summary  of  government  efforts,  beginning  in  1937,  to  encourage 
marriage  in  some  segments  of  the  population,  see  Glass,  D.  V.,  1067. 
Population  Policies  and  Movements.  London:  Cass,  pp.  327-  3 31. 

16.  Page  has  examined  the  effect  of  marriage  duration,  independent  of 
age,  on  the  childbearing  pattern  in  Sweden  since  1911.  (Page,  H..T., 
1077,  "Patterns  underlying  fertility  schedules:  A decomposition  by 
both  age  and  marriage  duration,"  Population  Studies  31,  pp, 85-106, l 


In  each  year  from  191  1 , when  recording  of  births  by  duration  of 
marriage  began,  until  1959,  the  final  year  of  the  fertility  sequences 
analyzed  in  the  present  report,  approximately  75-85%  of  all  legit- 
imate births  to  women  aged  15-19  and  approximately  34-45%  of  all 
legitimate  births  to  women  aged  20-24  are  reported  to  have  been 
premaritally  conceived.  See  Hofsten  and  Lundstrom,  op.  eit.  in 
footnote  10,  pp.  26-29,  as  well  as  Sveriges  Officiella  Statistik: 
Befolkningsrorel  sen  , 191  3-1959  . 

Henry,  L.,  1961  , "Some  data  on  natural  fertility,  " Eugenics  Quarterly 
8.  pp. 81 -91  . 

beginning  in  19  55,  live  births 

Data  by  single  year  of  age  of  mother  are  available  after  1890,  but 
will  not  be  considered  in  the  present  analysis  since  our  interest  here 
is  in  demonstrating  how  much  can  be  learned  from  more  widely 
available  five-year  age  group  data  by  this  exploratory  approach. 
Considerable  real  irregularity  is,  ot  course,  averaged  out  in  the  use 
of  these  five-year  age  groups.  Significant  change  (for  example,  in 
marriage  patterns)  may  also  occur  within  such  an  age  group. 

These  are  not.  of  course,  true  cohorts,  but  overlapping  approxima- 
tions cut  on  the  bias. 

7 

The  sum  of  age-specific  rates  for  the  five-year  age  groups.  9 . f(a) 

j=l 

expressed  in  the  rate  for  women  at  age  cut  49/50.  will  be  referred 
to  as  "total  rate."  Multiplication  of  this  rate  by  five  gives,  for  each 
cohort,  the  mean  completed  fertility  per  woman,  and  gives  for  each 
cross-sectional  schedule  the  conventional  expression  of  TFR  , or 
mean  number  of  children  per  woman  over  the  childbearing  years  of 
a synthetic  cohort. 

The  parameters  for  XX15  and  XX20  . an  be  expected  to  differ  slightly 
from  those  for  X15  and  X20  because  the  former  are  developed  by 
fitting  the  fertility  experience  of  the  last  68  years  alone,  divorced 
from  the  experience  of  the  preceding  117  years.  The  time  param- 
eters a and  8 for  the-  XX15  and  XX20  histories  are  seen  below  to 
i i 

follow  the  same  patterns  of  variation  as  those  of  the  X15  and  X20 
histories,  however,  and  to  differ  mainly  in  level  (see  pages  87  and 
92).  Fitting  the  shorter  sequence  alone  does  contribute  to  fine- 
tuning  of  the  fit  for  that  portion  of  the  longer  sequence. 

Results  of  a corresponding  analysis  of  cohort  overall  and  marital 
fertility  sequences  for  the  last  38  cohorts,  those  aged  15-19  in 
1892-1929,  will  be  referred  to  but  not  reported  in  detail  here. 


25.  In  an  extended  EHR  analysis,  the  possible  value  of  further  truncation 
can  be  explored. 

26.  McNeil  and  Tukey.  loc.  cit.  in  footnote  6. 

27.  Comparison  of  results  for  the  various  combinations  are  included  in 
Breckenr  idge , M.B.,  1976.  Time  Series  Model  of  Age-Specific 
Fertility:  An  Application  of  Exploratory  Data  Analysis.  Ph.D.  dis- 
sertation, Princeton  University. 

28.  The  iterative  fitting  procedures  and  display  programs  implemented 
by  McNeil  in  APL  for  use  with  large  data  sets  and  an  interactive 
computer  are  brought  together  in  McNeil,  D.R.,  1977,  Interactive 
Data  Analysis  , New  York:  Wiley. 

29.  E.  J.  Orav  (19771,  An  Expanded  Exploratory  Data  Analysis  Study 
of  Age-Specific  Fertility,  Senior  thesis,  Princeton  University)  has 
since  tested  a variety  of  five-  and  six-parameter  models  and  one 
eight  - par  amet  e r model  on  the  X15  sequence,  vising  the  folded  square 
root  re-expression  and  various  weightings. 

30.  While  this  measure  is  less  sensitive  to  a small  number  of  large 
residuals  than  is  a squared  variation  criterion  of  fit,  it  can  still 
produce  a misleading  impression  of  poor  fit  from  a very  good  overall 
fit  with  a few  "outliers.  This  criterion  of  fit  is  best  used,  therefore, 
in  conjunction  with  detailed  examination  of  residuals  to  determine 

the  nature  of  departures  from  fit.  A recently  developed  robust 
measure  of  variance  of  residuals  proves,  in  many  contexts,  to  be  a 
more  useful  measure  of  fit.  one  less  distorted  by  outliers."  This 

7 

s , described  in  Mosteller  and  Tukey,  op.  tit.  in  footnote  8, 
bi 

pp.  207-208,  has  been  used,  in  a slightly  modified  form,  in  exten- 
sions of  the  present  work  to  simplify  choice  between  fits. 

31.  The  similarity  to  the  logit  which  Brass  has  used  so  productively 
will  be  noted. 

32  If  the  fitting  had  started  instead  with  the  time  dimension  of  the 
matrix,  a similar,  but  probably  not  precisely  equivalent  fitted 
description  would  have  been  generated. 

33.  The  general  case  includes  variation  at  either  end  cut  in  relation  to 
the  other:  but  in  a fertility  distribution  cumulated  to  age  4 5-49,  the 
variation  at  the  last  age  cut  is  always  small  under  usually  encoun- 
tered circumstances. 


9 


About  97%  of  the  squared  variation,  or  about  86%  of  the  absolute 
variation,  is  taken  up  by  fitting  a single  t ime- independent  cumulative 
distribution. 

2 

While  the  robust  measure  of  variance,  s . (see  footnote  101  was 

not  used  in  the  fitting  process  in  the  work  reported  here,  values  of 

s"  of  residuals  for  several  sequences  are  included  in  Table  1 for 
bi 

comparison . 

No  attempt  has  been  made  to  pick  "best  fits." 


With  a folded  linearizing  re-expression  (such  as  the  folded  square 
root)  which  centers  the  distribution  on  its  mean,  the  values  at  age 
cuts  near  the  center  are  changed  relatively  less  than  those  at  the 
ends  of  the  distribution  in  the  process  of  ro-expression.  Therefore, 
a residual  of  a given  size,  when  it  appears  at  the  lowest  or  highest 
age  cuts,  will  have  relatively  less  significance  for  the  fitted  distribu- 
tion on  the  raw  fraction  scale  than  will  a residual  of  the  same  size 
when  it  occurs  at  one  ot  the  central  age  cuts.  For  example,  with 
the  folded  square  root  re-expression,  a residual  of  0.01  will  have 
its  highest  de-transformed  value,  0.007,  at  the  center  of  the  cumu- 
lated normalized  fertility  distribution  and  will  have  progressively 
lower  de-transformed  value  toward  either  tail  of  the  distribution. 

This  inclusion  has  generally  been  considered  problematic  because 
of  high  rates  of  premarital  pregnancy. 

The  difficulties  which  many  models  have  in  fitting  the  fails  of 
fertility  distributions  have  often  been  dismissed  as  relatively  un- 
important. Good  fit  in  the  tails  may  be  of  particular  importance, 
however,  when  total  fertility  is  low  or  the  age  pattern  of  entry  into 
cohabitation  is  changing. 

Tukey,  op.  cit  . in  footnote  7,  pp.  155-156.  108;  Mosteller  and  Tukey. 
op.  cit.  in  footnote  8,  pp.  192-191. 

These  plots  show  as  a box  the  interquartile  range  of  the  residuals 
for  each  of  the  six  age  cuts  19/20  to  44/46,  with  location  of  the 
median  residual  indicated  by  a bar.  The  relative  positions  of  upper 
and  lower  values  within  one  interquartile  distance  of  the  upper  and 
lower  quartiles  are  indicated  by  an  x beyond  each  end  of  the  box. 
Outlying  values  within  1.  6 times  the  interquartile  distance  of  each 
quartile  are  shown  by  empty  circles,  while  residuals  further  out 
are  shown  by  shaded  circles. 


1 


74 


42 . 


4 3. 


44. 


45. 


46. 


47. 


4 8. 


49. 


;o. 


51  . 


When  an  appropriate  compound  non-linear  smoothing  procedure 
(see  Tukey,  op.  cit.  in  footnote  7,  pp.  205-264  and  523-542)  is 
applied  to  the  residual  vectors  by  age  group  to  deeinphasize  irreg- 
ular fluctuations,  thus  providing  more  sensitive  detection  of  patterns 
of  co-variation,  the  tilts  in  the  scatterplots  persist,  further  sup- 
porting the  sense  of  some  remaining  underlying  structure  in  the 
small  residuals. 


A triple  multiplicative  model,  in  combination  with  the  folded  square 
root  re-expression  and  c : 12  in  the  weight  function,  does  remove 
the  long  stretches  of  residuals  above  or  below  the  central  interval, 
but  does  not  remove  inter-age  group  structure  as  seen  in  scatter- 
plots of  the  diminished  residuals  ( Br eckenr  idg e , M.  B.  , and  Orav, 

E.  J.,  "An  expanded  EHR  analysis  of  the  age  distribution  of  fertility" 
(in  preparation),  which  will  combine  results  from  Orav,  op.  cit.  in 
footnote  29,  and  parallel  analysis  and  research  by  the  present  author). 

Anscombe,  F.J.,  1967.  "Topics  in  the  investigation  of  linear  rela- 
tions fitted  by  the  method  of  least  squares,"  Jour.  Royal  Statist. 

Soc.,  BJQ,  pp.  11-52. 

In  EHR  analysis  of  post-lSb0*  single-year-of-age  cohort  data  for 
Sweden,  lagged  pattern  persists  in  the  residuals,  suggesting  that 
factors  in  addition  to  age  grouping  and  method  of  constructing  the 
cohort  sequences  must  be  sought. 

This  question  receives  further  attention  below  (see  page  64). 

Br  eckenr  idg  e . unpubl  i shed  . 

where  rank-two  refers  to  the  sum  of  two  rank-one  terms,  and  a 
rank-one  term  is  the  product  of  a constant  by  a function  of  row  alone 
by  a function  of  column  alone. 

A modification  of  a program  written  by  Alison  Pollack  was  used  for 
this  procedure. 

Tukey,  J.W.,  1977,  "Transfactorial  fits.  The  linear  geometry  in 
the  two-way  case.  " 


For  any  one  age  group,  the  amount  of  change  in  f(a)  on  the  raw 
scale  varies  slightly  and  systematically  over  repeated  increments 


in  j9  to  higher  lev-els,  holding  a constant,  or  over  repeated 
i l 

increments  in  a to  higher  levels,  holding  R constant--a  conse- 

l l 

quence  of  having  used  a non-linear  re-expression  of  the  data  in  the 
fitting  procedure. 


■HIMMi 


r* 


When  one  is  considering  less  than  all  potential  and  functioning 
activity  states,  e.g.  the  activity  of  married  women  only,  or  of 

women  above  age  19  only,  a.  A.  expresses  the  tendency  for  activity 

i .1 

to  be  pulled  toward  the  median  age  of  a.  A.  rather  than  to  be  dis- 
tributed evenly  beyond  the  first  age  group,  whatever  proportion  of 
activity  is  attributable  to  that  first  age  group. 

At  times,  such  sub- populations  will  be  readily  identifiable  ethnic, 
religious,  regional,  or  occupational  groups.  Widespread  incidence 
of  highly  intermittent  activity,  with  varying  causes,  may  also  be 
a significant  source  of  diversity  between  women  contributing  to 
spread  of  the  aggregate  distribution. 

Breckenridge  and  Orav,  loc.  cit.  in  footnote  43. 


A and  B for  all  sequences  are  shown  in  Appendix  B,  Table  B3. 

j j 

Henry,  loc.  cit.  in  footnote  18.  These  schedules  are  considered  to 
represent  the  aggregate  childbearing  experience  of  couples  when 
their  behavior  affecting  fertility  is  not  influenced  by  the  number  of 
children  already  born  to  them.  Such  distributions  are,  however, 
recognized  to  reflect  cultural  and  biological  variations  which  may 
not  be  age-  and  par  ity  - independent . 

In  the  absence  of  "control,'  M is  interpreted  as  the  level  at  which 
natural  fertility  is  experienced;  in  the  presence  of  control.  however 
Trussoll  reports  that  M appears  to  be  a composite  of  several  fac- 
tors: not  only  the  level  of  underlying  natural  fertility,  but  also  func- 
tions of  total  fertility  or  degree  of  control  of  fertility,  and  variations 
in  the  distribution  due  to  spacing  at  high  levels  of  birth  limitation. 
These  factors,  he  concludes,  elude  separation  with  the  existing 
models  of  age-specific  marital  fertility.  (Trussell,  T.  J.,  1977, 
presented  at  the  IUSSP  seminar  on  natural  fertility,  Paris,  March 
1077.  ) 

Trussell  (loc.  cit.  in  footnote  57)  discusses  the  need  for  modification 
of  v(a)  to  include  a marriage-duration  effect,  and  the  problems  of 
doing  this  except  in  an  overall  fertility  model. 

The  fact  that  B.  is  not  pegged  to  any  particular  age  may  be  of  per- 

ticular  importance  in  describing  fertility  distributions  such  as  those 
in  the  Swedish  time  sequences,  in  which  low  order  births  influence 
the  distribution  well  above  age  25  for  most  of  the  period  analyzed. 


76 


60.  Mean  number  of  children  per  woman  for  the  "natural"  schedules 
ranges  from  10.  9 to  6.  Z,  compared  to  a high  of  7.  76  for  MX20. 

61.  When  Sundbarg's  estimated  age-specific  marital  fertility  rates  for 
ages  20-49  for  five-year  periods,  1750-1890,  are  appended  to  the 
68- single-year  series  for  1892-1959,  and  the  whole  sequence  fitted 
by  the  same  EHR  procedures  used  in  this  report,  the  29  values  of 
a.  and  |3.  covering  the  first  140  years  fluctuate  slightly  around 

the  values  of  a and  /3  for  1892-1894. 
l i 

62.  If  the  Coale  model  fits  a schedule  perfectly,  the  value  of  m will  be 
the  same  at  all  ages,  indicating  that  the  population  follows  the 
standard  age  pattern  of  decline  of  fertility  with  uniform  intensity. 

The  calculated  values  of  m for  these  Swedish  histories  do  show 
variability  with  age  for  any  given  year  and  vary  in  different  ways  in 
different  periods,  probably  due,  at  least  in  part,  to  the  effect  that 
changing  age  patterns  of  marriage  and  entry  into  childbearing  have 
had  on  the  cross-sectional  schedules. 

63.  A new  procedure  for  determining  a single  value  of  m by  regression 
(A.  J.  Coale,  personal  communication)  also  emphasizes  the  shape 

oi  the  schedule  over  the  central  ages  of  childbearing  and  omits  some 
higher  ages  entirely.  By  removing  dependence  of  M on  age  group 
20-24  (but  leaving  via)  pegged  to  that  age  group),  this  procedure 
provides  for  these  Swedish  schedules  a time  sequence  of  m with  the 
same  pattern  of  variations  as  those  in  Fig.  10  but  with  values  ranging 
from  .2  to  1.7.  All  of  these  procedures  appear,  then,  to  pick  up  the 
same  pattern  of  change  over  time  in  these  schedules.  The  EHR 

standard  form  parameter  appears,  however,  to  register  more 

fully  in  a single  parameter  (see  pages  41  and  42)  the  change  in  age 
distribution  of  fertility  associated  with  limitation  of  births  than  do 
the  Coale- Trussell  procedures,  which  variously  divide  the  force  of 
this  change  between  m acting  on  v(al  and  M acting  on  n(al, 
depending  on  the  method  of  determining  m.  This  division  may,  in 
some  instances,  be  of  consequence  in  the  use  of  the  Coale-Trussell 
model  fertility  schedules,  which  incorporate  the  Coale  model  of 
marital  fertility  except  for  omission  of  M: 

f (a  I G(a  ln(a  leIT1  V <a  ' 

where  f(al  age-specific  overall  fertility  rate 

G(a)  r age-specific  proportion  ever-married  , 

To  the  extent  that  variable  aspects  of  marital  fertility  would  have 


been  incorporated  in  M acting  on  n(al,  these  aspects  would  be 
absorbed  by  G(a),  thus  attributing  to  the  age  pattern  of  marriage 
some  of  the  variation  in  the  overall  fertility  distribution  actually  due 
to  marital  fertility. 

64.  Total  rate  of  marital  fertility  refers  to  the  sum  of  age-specific  rates 

7 

for  the  five-year  age  groups.  £ f ( a ) , expressed  in  the  rate  for 

j = 1 

women  at  age  cut  49/50. 

65.  See  Mosteller  and  Tukey.  op.  cit.  in  footnote  8.  pp.  268-270,  for 
discussion  of  this  use  of  regression. 

66.  Impressions  of  change  in  marriage  patterns  are  based  on  a summary 
of  some  of  Ewbank's  findings  for  single-year  birth  cohorts  of  1851- 
1022  (Ewbank,  loc.  cit.  in  footnote  14  > . To  be  consistent  with  the 
designation  for  childbearing  cohorts  by  five-year  age  group  used 
throughout  the  present  work,  the  mean  and  variance  of  age  at  mar- 
riage for  cohorts  aged  15-19  in  a given  year  is  taken  here  as  the 
average  of  Ewbank's  values  for  the  five  cohorts  comprising  those 
who  became  15-19  in  that  year. 

Year  at  Age  15-19  Age  at  Marriage 


}jl  I range ) 

a (range  I 

years 

years 

1 876-1  879 

27.71-27.61 

6. 62-6. 60 

1 880- 1 891 

27. 54-27. 28 

6 . 60-6. 46 

1 892-191 1 

27. 22-27. 09 

6 . 5 3-6.  34 

1912-1917 

27. 25-27. 6 I 

6.64-7. 09 

1 91 8-1 921 

27. 70-27. 75 

7. 07- 6 . 75 

1 922-1 929 

27.64-26. 59 

6 . 56-5. 47 

1930-1941 

26.41 -24. 46 

5.  37-4 . 5g 

67.  Significant  levels  of  widowhood  or  divorce  at  childbearing  ages 
would,  of  course,  also  add  to  the  degree  of  positive  skewness  of 
the  overall  fertility  distribution. 

68.  This  expression  of  mean  number  of  children  per  married  woman  is 
synthetic  first  of  all  in  the  same  sense  as  is  the  cross-sectional  TFR 
(expressed  by  (XT|(5I):  it  sums  the  fertility  experience  of  women  in 
a number  of  different  age  cohorts  at  a given  time  as  if  this  were  the 
childbearing  experience  of  a single  cohort  over  time.  The  expression 
is  synthetic  in  a second  sense  also:  it  sums  at  a given  time  the  fer- 
tility experience  by  age  of  those  women  who  were  then  married,  as 

if  all  of  these  women  had  been  in  a single  marriage  cohort  which 
started  childbearing  at  age  15-19  and  bore  children  at  each  age  at 
the  same  rate  as  did  those  who  were  actually  married  at  that  age. 


1 


Further  work  shows  that  both  a legitimate  fertility  sequence  and  a 
"married  or  actively  cohabiting1'  sequence  can  be  as  well  fit  by  EHR 
analysis  as  can  the  overall  and  marital  fertility  sequences  analyzed 
here  ( Breckenridge , unpublished). 

Recall  that  two  triangles  of  incomplete  cohorts  are  omitted--one  at 
the  beginning  of  the  sequence,  one  at  the  end- -so  that  a matrix  of 
185  years  provides  155  complete  cohorts. 


Breckenridge  and  Orav,  loc.  cit.  in  footnote  43. 

The  comparability  of  X20  and  C20  time  parameters  can  be  estab- 
lished in  the  same  way  from  the  appropriate  pairs  of  A.  and  B.  age 
vectors,  which  are  shown  for  all  sequences  in  Appendix  B,  Table  B2. 

The  1942-1445  aberration  in  a for  both  X15  and  X20  and  the  post- 

l 

1950  rise  in  this  parameter  for  X15  have  already  been  noted  in  the 
shorter  time  sequences,  XX15  and  XX20  (see  page  36  I. 


There,  after  1909,  only  the  cohorts  of  1914,  1917,  1918,  1920  and 
1921  fit  the  model  without  definite  departure  for  at  least  one  age  cut. 


Ewbank,  loc.  cit.  in  footnote  66. 

Hofsten  and  Lundstroin,  op.  cit.  in  footnote  10. 


The  year  1792  is  notable  for  the  assassination  of  King  Gustavus  III 
after  a period  of  political  unrest.  Severe  famine  is  variously 
recorded  for  years  from  1780  to  178T  (Thomas.  D.S.,  1949,  So  c i a I 
and  Economic  Aspects  of  Swedish  Population  Movements.  1750-1933, 
New  York:  Macmillan,  pp.81-88,  102-108,  identifies  1780-1783  and 
1785  as  years  of  major  crop  failures.  Utterstrom,  G.,  1954,  "Some 
population  problems  in  pre-industrial  Sweden,"  Scandinavian 
Economic  History  Review  2,  pp.  103-165,  questions  the  harvest 
index  which  was  Thomas'  criterion,  and  identifies  1783-1785  as 
famine  years.  ) Whether  these  circumstances  affected  the  actual 
level  and  age  distribution  of  births  in  the  years  in  question,  or 
whether  they  led  to  errors  in  population  estimates  or  in  recording 
of  births  is  open  to  investigation. 


79 


REE I DUAL 


AGE  CUT 


1770  1790  1 810  1 830  1850  1 870  1 890  1910  1 930  1 950  1 970 

YEAR 


Fig.  1.  Time  sequence  plot  of  residuals  by  age  cut  from  EHR  fitting 
(with  c = 6 in  the  bi  weight  loff.  = a A + fl  B to  the  c ros  s - 

>J  > ) i J 

sectionaJ  age  20-49  overall  fertility  sequence,  1775-1959  (with 
data  expressed  on  the  raw  fraction  scale). 


80 


19/20  24/25  29/30  34/35  39/40  44/45 
AGE  CUT 


AGE  CUT 

Lower 

RESIDUALS 

Upper 

Minimum 

quartile 

Median 

quart! le 

Maximum 

19/20 

-.0366 

-.0060 

-.0012 

.0042 

.0637 

24/25 

-.0956 

-.0062 

-.0015 

.0054 

.0240 

29/30 

-.1292 

-.0068 

.0004 

.0058 

.0236 

34/35 

-.0670 

-.0048 

.0000 

.0062 

.0265 

39/40 

-.0223 

-.0057 

.0002 

.0057 

.0390 

44/45 

-.0180 

-.0054 

.0011 

.0044 

.0515 

Fig.  2.  Schematic  plota  of  reaiduala  by  age  cut  (folded  aquare  root  acalei 

from  EHR  fitting  of  F . = a A . * 8 B to  the  cohort  age  15-49 
‘J  > J > J 
overall  fertility  aequence,  1775-1929. 


I 


RESIDUALS 


19/20  24/25  29/30  34/35  39/40  44/45 
AGE  CUT 


AGE  CUT 

RESIDUALS 

Lower 

Upper 

Minimum 

quar ti le 

Medi an 

quarti le 

Maximum 

19/20 

-.0138 

-.0022 

.0004 

.0029 

. oi 

24/25 

-.0157 

-.0052 

.0000 

.0056 

.0119 

29/3C 

-.0136 

-.0032 

.0006 

.0029 

.0109 

34/35 

-.0152 

-.0052 

-.0008 

.0051 

.0128 

39/40 

-.0109 

-.0051 

-.0015 

.0040 

.0258 

44/45 

-.0234 

-.0069 

-.0017 

.0069 

.0241 

Fig.  3.  Schematic  plots  of  residuals  by  age  cut  [folded  square  root  scale 

from  EHR  fitting  of  F.  = a A 4 fi  B to  the  cross-sectional  age 

U ' J i J 

15-49  overall  fertility  sequence.  1775-1959. 


AGE  CUT  24/25 


82 


A 

B 

C 

RANGE  OF  X:  “0.013  0.015 

OF  Xi  “0.02 

0.03 

range 

OF 

X:  “0.04  0.04 

RANGE  OF  Yt  0.02  0.013 

RANGE  OF  Yt  “0.02 

0.015 

Range 

OF 

Yt  0.02  0.015 

* l * 

■ X | 

1 ■ 

22222* 

442*  1 

1 242* 

< 35622*2 

»<Tb4l 

125436 

**2236944* 

5M4-* 

**137083 

*»*2  *2*344* 

3953 

■ 

*2*4*633 

■ 322*233  22* 

*675*  * * 

Jv'f * 

» 4673* 

* 323*4*4*323*  *» 

2424063  ** 

5T743** 

**2«2  **22**  ** 

* * 4*44*  * 

2*6*42* 

■ *2  "3  1 ■ * * 

« 1 5 32 

43*  3 1 

M | * ■ ■ 

**  1 * ■ 

« 

2 * 1 

M | 

1 

1 

1 ■ 

1 

1 

* 1 

1 

1 

AGE  CUT  29/30  AGE  CUT  39/40  AGE  CUT  44/4S 


Fig.  4.  Scatter  plots  of  residuals  for  pairs  of  age  cuts  (folded  square 

root  scale'  from  EHR  fitting  of  F . : Q A 4 R B to  the  cross- 

lj  it  i j 

sectional  age  1 5 - 4 Q overall  fertility  sequence.  1775-1959. 


RESIDUAL 


AGE  CUT 


♦0.025 

0 

♦0.025 

0 

♦0.025 

0 

+0. 025 
0 

♦0.025 

0 

♦0.025 

0 

“0. 025 
1 


44/45 


39/40 


34/35 


29/30 


4-  _ 


4-f*H  +k-  44  -p- 


24/25 


<*t 


Jk. 


19/20 


--9*f 


1 1 1 I I I I i 

70  1790  1810  1830  1850  1870  1890  1910  1930 


Fig.  5.  Timf  sequence  plot  of  coded  residuals  by  age  cut  (raw  fraction 

scalel  from  EHR  fitting  of  F = a A + B B to  the  cross-sectional 

U ' J * ) 

age  15-49  overall  fertility  sequence,  1775-1959 


8-J 


RESIDUAL 

+ 0.  05 


AGE  CUT 


0 

+ 0.05 
0 

+ 0.05 

0 

+ 0.05 
0 

+ 0.05 
0 

+ 0.05 


<**+f++^ + + +**  ^+  _ * ♦ +**H-  + » 

*>-  <*>  ««r<J-  V'T-v*  "Vt^o  o^W^-w — 


+*+* V>  **  'V  t + +*, ->  'M_  _+t 
r o V0  o°  <^0  0 “*■  ^ ' 


+>  + +J-_  + + _+.  + - 

■V*0  V v.  o°v°0°  • •>* 


,o®  . O 


s <*<*+. 


♦4++**V  . 

>«*+>  ~ ooo"» 


29/30 

♦ . *-* 


4++  / t «■  +4++-*  «+•** 4-+V  + _+*• j^+H  t 

) ° *>  CC^P  ° 0^~  CPO  <X^> 


19/20 


1810 

1 8: 

10  1850 

18 

70 

1890  1910  193 

YEAR  AT  AGE  15-19 

Time 

sequence  plot  of 

coded  residual 

s by  age 

cut 

(raw  fraction 

scale 

from  EHR 

fittm 

c of  F = a A 

+ 0 B to 

the 

cohort  ape 

U 1 J 

> j 

15-4  Q 

overall  fert 

ility 

sequence,  1775 

-1929. 

RESIDUAL 


AGE  CUT 


+0.025 

0 

+"G . 0 25 
0 

+0.025 

0 

+0.025 

0 

+0.025 


r OOO r— QTQ- 


_ +++44+4_44.*.4+4_.f_M.  +■  +-M-+-M-. <_44-t'4444.f+ 

OOoOoO  0<T^°«>OOoOOo00000000  00000 


+ 0.025  l 


'0.  025 


44/45 


4-  4f-  » +++ 


-oooooo 


39/40 


1 + . 4 * f t J 

■oooooo~ooo°Oooo0ooooc> T ' ' co 


34/35 


29/30 


I +44_+4+4+1-+ +-+-+ 


0^OoO°°O~  O000000o000oc°000~0 


24/25 


f^4^+++ 


L __+_t_ 


OOOOOo00°  °" 


■°  0ooooo0<:,ot><>'0000000 


19/20 


n L ♦ * +++_  _ 4+4+4  + _ •►+*++4  4 +++  +4  -M-  ++4+_+4_ 

D ^ oo0  «r.  «r  0%  o " Oo  oo  - °ooO000°0 


_L 


_L 


_L 


1890  1900  1910  1920  1930  19W0  1950  196 

YLV 


Fig.  7.  Time  sequence  plot  of  coded  residuals  by  age  cut  ira\* 

scale)  from  EHR  fitting  of  F. . = a . A . + B.  B . to  the  . r - • 

tj  i J > J 

sectional  age  15-49  marital  fertility  sequence.  18*. 


AD-A070  307 


UNCLASSIFIED 

2*2 

AO 

A07030’ 


PRINCETON  UN IV  N J DEPT  OF  STATISTICS  F/G  6/16 

AN  EMPIRICAL  HIGHER-RANK  ANALYSIS  MODEL  OF  THE  AGE  DISTRIBUTION— ETC (U) 
MAY  78  M B BRECKENRIDgE.  J M TUKEY  DAAG29-76-G-0298 


TR-143-SER-2 


ARO- 14244. 6-M 


• • - 

7 79 


B*  or  XT  * * 

1 KKKK  XX15  0T  Or 


YEAR 


Fig.  9. 


EHR  standard  form  time  parameters  a and  B , and  total  rate 

i i 


of  fertility,  for  cross-sectional  age  15-49  and  age  20-49  overall 
fertility  sequences,  1892-1959. 


1890  1900  1910  1920  1930  1940  1950  1960 

YEAR 


* 

ig.  11.  EHR  standard  form  marital  fertility  time  parameters  a.  and 

fi.  , linearly  compensated  for  total  rate  of  marital  fertility 

(cross-sectional  age  15-49  and  age  20-49  marital  fertility 
sequences,  1892-1959). 


i-  MX  o. 


1890  1900  1910  1920  1930  1940  1950  1960 


i-MX  Pi 


XXXX  XXI 5 
XX20 


05  I**" 


1890  1900  1910  1920  1930  1940  1950  1960 

YEAR 


Fig.  12.  EHR  standard  form  overall  fertility  time  parameters  a.  and 

fl  (cross-sectional  age  15-49  and  age  20-49  overall  fertility 
i 

sequences,  1892-1959)  linearly  compensated  for  the  corresponding 
EHR  standard  form  marital  fertility  time  parameters  (cross- 
sectional  age  15-49  and  age  20-49  marital  fertility  sequences, 
1892-1959). 


strewn  in 


Fig.  14.  EHR  standard  form  time  parameter*  a and  fi  , and  total  rate 

of  fertility,  for  cron  - sectional  age  15-49  and  age  20-49  overall 
fertility  aequencea,  1775-1959. 


A 1 


I 


APPENDIX  A 

A Robust/Resistant  T -ocedure  for  the  Iterative  Fitting 
of  Two  Multiplicative  Components  to  an  M x N Matrix 

The  iterative  fitting  procedure  used  in  fitting  two  multiplicative  com- 
ponents to  a fertility  matrix  . F.  . , can  be  summarized  as  follows: 

hi 

Cellwise  weights  w are  based  on  the  bisquare  function 

ij 

2 2 , , 

w(u)  = ( 1 -u  1 for  | u | < 1 
w(u)  = 0 otherwise 


Starting  with 


w (0'  =[l-(F../cs<°')2]2 
i 1 hi 


where  s 
c 


(0) 


= median  F. 

11 

= a constant  of  assigned  value 

(so  that  residuals  of  size  greater 
than  c times  the  median  absolute 
deviation  are  given  zero  weight) 


a (0'  V'  rr 

A . = E F.  . w 

1 


(0)  / E 


(0) 


j=i  1J  1J  j=i  1J 


(0)  _ 

a.  = i 

i 


and  designating  the  residuals  at  iteration  m 


(m)  „ , (m)  (m) 

z. . = F.  . - (tt  . A . 

H ij  i .1 


and  weights  at  iteration  m 


(m)  , (m).  (m)  2,2 

w. . = | 1 - (z  . /cs  ) | 

ij  ij 

, (m ) ,.  (m) 

where  s = median  z.. 

ij 


the  estimators  of  A.  are  improved 


. (m+1)  . (m)  ^ (m)  (m)  (m)  / ^ (m)  (m) 

A.  =A.  +ZyZ..  w..  a . / L w,,  a. 


i = l 


ij  hi 


i il  1 

i = l 


APPENDIX  A ■ 


m hik 


cont 'd . 


r 


i 


— 


A2 


and  standardized 


(m  + 1 ) 


A. -A'm'1:A 

.1  1 


(m  + 1 l 2 
Z,  (A  . I 


and  the  estimators  of  a.  are  improved 


(m  + 1)  (m) 


Z,  (m  + 1)  (m  + 1)  A(m  + 1)/Z>  (m  + 1 ) (m  + 1 1 

2j  z w.  A . /Zj  w..  A . 

- , i i hi  ,i  • , hi  J 


L = 1 


i = l 


Iterations  continue  until  a selected  convergence  criterion  is  met 

, i ( ti"i d- 1 . 2 / (m\2  i , 

1-|Zj(cs  ) II,  (cs  ) | < e 


The  residuals  z (m  + nl  from  this  portion  of  the  fitting  procedure  are 

ii 


then  examined  in  the  same  way  for  B.,  beginning  with 

.1 

(m+n)  r,  . (m+n)  / (m+n)  ? i2 

w.  . = [ 1 - (z.  . / cs  ) | 


B 


(0) 


c 

V 


) = : 


(m+n)  (m+n) 


i 1 


w 


1 1 


/s  w..< 

*.i 


(m+n) 


P: 


(0) 


( p ) a (p) 

aiid  iterating  to  convergence  in  B.  and  p. 

.1  1 

(m+n+p) 

The  two-stage  procedure  is  then  repeated  for  the  residuals  z 
and  so  on  iteratively  to  convergence  in  final  estimates  of  A.  a..  B.,/3.. 

Optimal  values  of  c appear  to  vary  somewhat  with  the  data  and  the 
desired  degree  of  resistance  to  outliers.  Values  between  6 and  9 are  com- 
monly useful.  Least  squares  estimators  are  approached  as  c — > ® . 


B 1 


■ 


Table  BI.  Fitted  Age  Parameters  (folded  square  root  scale) 
Derived  by  EHR  Analysis  of  the  Age  Distribution 
of  Overall  and  Marital  Fertility  in  Selected  Time 
Sequences,  1775-1959 


Sequence 

Parameter 

Age 

Cut 

A 19/20 

j 

24/25 

29/30 

34/3  5 

39/40 

44/4  5 

XI  5 

5835 

-. 3504 

-.0889 

. 1620 

. 3949 

. 5888 

XXI  5 

5331 

-.2570 

. 0108 

. 2453 

.4530 

.6199 

Cl  5 

-.5810 

-.  3478 

-.  0845 

. 1669 

. 3986 

. 5895 

MX1  5 

-. 1093 

. 1 1 54 

. 2802 

.4190 

. 5449 

.6510 

X20 

-. 4620 

-.1323 

.1816 

.4729 

. 71  59 

XX20 

-.  3393 

-. 0055 

. 2793 

. 5284 

. 7264 

C20 

-.4467 

-.1149 

. 1969 

.4829 

. 71  78 

MX20 

B. 

j 

-. 1429 

.1431 

. 3661 

. 5595 

. 71  56 

XI  5 

. 2820 

. 5047 

. 5522 

. 4772 

. 3329 

. 1493 

XXI  5 

. 3579 

. 5229 

. 5502 

. 4639 

. 2818 

. 0349 

Cl  5 

.2177 

.4  860 

. 562  3 

. 4976 

. 362  7 

. 1455 

MX1  5 

. 6627 

. 53  39 

. 3907 

. 21  71 

-. 0120 

-. 2754 

X20 

. 6255 

. 6079 

. 4404 

. 1 954 

-. 0834 

XX20 

. 5862 

. 6048 

.4  780 

. 2 34  4 

-. 0850 

C2  0 

. 5584 

. 6029 

. 4885 

. 2926 

.0217 

MX20 

.6314 

. ^783 

.4143 

.1175 

-. 2855 

I 


Table  132.  Results  of  Regressions  to  Fix  EHR-Fitted  Fertility 
Distribution  Parameters  for  Re- presentation  of  Fits 
in  a Standard  Form 


LS  Regression 

Degree  of  Fit 

Canonical  Reg 

re  s s ion 

Sequence 

Coeffic  ient  s 

(linear  scalel 

Element  s 

of  First 

Eigenvalue  for 

Eigenvector 

First  Canonical 

a 

b 

c 

d 

Variate 

XI  5 

. 6867 

. 0062 

. 990  3 

.9875  - 

. 0899 

. 9989 

X20 

. 840  3 

-.2862 

. 9921 

.8948  - 

. 2724 

. 9996 

XXI  5 

. 6880 

-.0997 

. 9932 

.95  32  - 

. 2536 

. 9997 

XX20 

. 8061 

-.2560 

. 9934 

.9321  - 

. 3562 

. 9999 

CIS 

. 9S0S 

-.0249 

. 9828 

.99  34  - 

. 0886 

. 9989 

C20 

1 . 075 

-.2428 

. 9846 

.9415  - 

. 2808 

. 9997 

MX  1 5 

. 6951 

-.429  3 

. 9893 

. 8070  - 

. 59  36 

. 9994 

MX20 

. 7808 

-.5033 

. 991  5 

.9463  - 

. 32  88 

. 9999 

B4 


Table  B3.  Fitted  Age  Parameters  (folded  square  root  scale) 
After  Standard  Form  Re-presentation  of  EHR  Fits 
to  the  Age  Distributions  of  Overall  and  Marital 
Fertility  in  Selected  Time  Sequences,  1775-1959 


Age  Cut 


Sequence 

Parameter 

19/20 

24/25 

29/30 

34/3  5 

39/40 

44/45 

a 

' i 

a] 

J 

n 

CT 

XI  5 

1 . 476 

0.  019 

-.6016 

-.3915 

-.1375 

.1170 

. 3600 

. 5680 

XXI  5 

1.468 

0.  01  3 

-. 5989 

-. 3776 

-.1292 

.1158 

. 3603 

. 5821 

Cl  5 

1 .479 

0.  030 

-. 5965 

-. 3886 

-. 1 337 

.1217 

. 3639 

. 5728 

MX1  5 

1 . 226 

0.  018 

-.4816 

-.2238 

-. 0058 

. 2093 

. 4468 

. 6888 

X20 

1 . 205 

0.  01  3 

-. 5838 

-.2839 

. 0425 

. 3699 

. 6633 

XX20 

1.  1 87 

0.  01  1 

-. 5250 

-.2206 

. 0898 

. 4087 

. 7075 

C20 

1 . 203 

0.  023 

-. 5774 

-.2775 

. 0482 

. 3725 

. 6697 

MX2  0 

1.106 

0.013 

-. 3428 

-. 0547 

.2102 

. 4908 

. 771  1 

range  of  |3.  B . 


XI  5 

-0.  0 39  to 

1 J 

0.  84  3 

. 2870 

. 5074 

. 5525 

. 4754 

. 3291 

. 1438 

XXI  5 

0.  037  to 

0.  81  5 

. 2801 

.4  846 

. 5502 

.4983 

. 3468 

. 1245 

Cl  5 

-0. 062  to 

0.  633 

.2025 

. 4771 

. 560  3 

. 5022 

. 3733 

.1610 

MX  1 5 

0.4  73  to 

1.144 

. 5056 

. 5140 

. 4788 

. 4042 

. 2757 

. 1075 

X20 

0.  191  to 

0. 971 

.4826 

. 5802 

. 51  78 

. 36  74 

.1654 

XX20 

0.140  to 

0.  849 

.4575 

. 5758 

. 5408 

. 3834 

. 1402 

C20 

0 . 168  to 

0.  802 

.4527 

. 5709 

. 5273 

. 3975 

. 1819 

MX2  0 

0.  204  to 

0.  71  1 

.4521 

. 5622 

.5451 

. 4008 

. 1474 

APPENDIX  C 

Table  Cl.  The  Age  Distributiona  of  Natural  Fertility,  Reported*11  and  Fitted  a* 
Weighted  Sum*  of  the  A*  and  B*  Derived  by  EHR  Analyaia  of  the 
Swedish  Age  20-49  Marital  Fertility  Time  Sequence  for  1892-1959 


Source  of  Distribution 


EHR  Standard  Av  with  lowest  a* 
J l 


with  highest  a 

Hutterites 

1921  -1930 

Reported 

Fitted 

Residual 

Canada 

1700-1730 

Reported 

Fitted 

Residual 

Hutterites 
before  1921 

Reported 

Fitted 

Residual 

Europeans  of 
Tunis 
1840-1859 

Reported 

Fitted 

Residual 

Crulai 

1674-1742 

Reported 

Fitted 

Residual 

Norway 

18  74-1876 

Reported 

Fitted 

Residual 

Bourgeoisie  of 
Genova 

before  1600 

Reported 

1'  it  ted 

Residual 

Taiwan 
c.  1900 

Reported 

F itted 

Residual 

Rniircroisie  of 
Geneva 

1600-1649 

Reported 

1 tiled 

Residual 

Sotteville- 

Les-Rouen 

1760-1790 

Reported 

Fitted 

Residual 

Iran 

1940-1950 

Reported 

Fitted 

Residual 

India 

1945-1946 

Reported 

Fitted 

Residual 

(U 


EHR  Time 
Parameter 


■i 

20-24 

25-29 

1. 080 

0 

. 2473 

.2110 

1.  138 

0 

. 2348 

.2212 

. 2514 

. 2294 

1.083 

. 0576 

. 2613 

. 2177 

-.0119 

.0117 

. 2358 

. 2293 

l.  151 

. 0354 

. 2422 

■ 22  74 

-.0064 

. 0019 

. 2425 

. 2302 

1.  141 

. 0495 

. 2482 

. 22  74 

-.0057 

. 0028 

. 2562 

.2354 

1.165 

. 0948 

. 2561 

. 2365 

. 0001 

-.0009 

. 264  3 

. 2523 

l . 182 

. 1 650 

. 2727 

. 24  72 

- . 0084 

. 0051 

.24  34 

.2  336 

1.091 

. 0449 

■ 2577 

■ 21  79 

-.0143 

.01  57 

. 2602 

. 2421 

1.155 

. 1 296 

. 268  3 

.2385 

-.0081 

. 0036 

. 2626 

. 2403 

1.  1 74 

. 1287 

. 2639 

.2419 

-.0013 

-.0016 

. 2788 

.2576 

1.163 

.2184 

. 2928 

. 2490 

0140 

. 0086 

. 2682 

. 2514 

1 . 200 

. 1849 

. 2744 

. 2526 

-. 0062 

-.0012 

.2642 

.2475 

1.141 

. 1460 

.2762 

.2377 

-.0120 

. 0098 

.2609 

. 2326 

1.093 

.2395 

. 2769 

. 2274 

-.  0160 

. 0052 

Age  Croup 


30-34 

35-39 

40-44 

45-49 

.2001 

. 1891 

. 1284 

. 0241 

. 2107 

. 1961 

. 1238 

.01  34 

. 2043 

. 1856 

.1015 

. 0279 

. 1 090 

. 1807 

.1180 

. 021  3 

. 0053 

. 004  9 

-.  01 75 

. 0066 

. 2242 

. 1899 

. 1070 

.01  39 

.2119 

, 1920 

.1161 

. 0105 

.0123 

-.0021 

-.0001 

. 0034 

. 2169 

. 1909 

. 1046 

. 0148 

. 2098 

. 1 887 

. 1 '45 

.0114 

. 007 1 

. 0022 

-.  0009 

. 00  34 

. 2200 

. 1773 

. 1040 

. 0071 

. 2123 

■ 1 840 

.1039 

. 0071 

. 0077 

-. 0067 

. 0001 

. 0000 

. 2252 

. 1682 

. 0841 

. 0060 

.2125 

. 1742 

JJ896 

. 0039 

. 0127 

-. 0060 

- . 0055 

. 0021 

. 2096 

. 1776 

.1106 

. 0252 

■ 2009 

■ 1 8 36 

.1107 

. 0201 

. 0087 

-. 0060 

- . 0001 

. 0051 

.2187 

. 1 830 

. 0823 

. 01  27 

. 2092 

. 1 773 

. 0OO1 

. 0075 

. 0095 

. 0066 

-.0168 

. 00  52 

. 2201 

. 1892 

. 0820 

. 0058 

. 2126 

. 1 704 

. 0060 

. 005  3 

. 0075 

. 0008 

-.0149 

7*0005 

. 2278 

. 1 524 

. 0740 

. 0085 

. 2065 

16  36 

. 08  33 

■ 004  8 

. 02  1*3 

-.0112 

-. 0084 

. 0037 

. 2291 

. 1760 

. 0608 

. 0056 

.2147 

.1725 

. 0837 

. 0021 

. 0144 

. 0035 

-.01  39 

. 0035 

.2174 

.1706 

. 0870 

. 0134 

. 2061 

. 1733 

. 0978 

. 0080 

. 01  1 3 

-.  0027 

-.  01 08 

. 004  5 

. 22  78 

. 1712 

. 0808 

. 0267 

. 2001 

.1735 

. 1063 

. 01  58 

. 0277 

0023 

-.0255 

. 0109 

Data  from  Henry,  loc.  ci*.  in  footnote  18. 


1 


9 

I 


1 


D1 


APPENDIX  D 


THE  RELATIONSHIP  OF  EMPIRICAL  ANALYSIS  TO 
MORE  NARROWLY  MODELLED  ANALYSIS 

John  M Tukey 

Princeton  University*  and  Bell  Laboratories 
Princeton.  New  Jersey  08540  and  Murray  Hill,  New  Jersey  07974 


ABSTRACT 

It  we  are  to  make  proper  use  of  both  empirical  analysis  and  more  narrowly 
modelled  analysis  - in  particular  to  make  good  use  of  both  EHR  on  the  one 
hand  and  (he  Co ale-Trussell  fertility  schedules  (and  their  analogs  and  generali- 
zations I on  the  other  - we  need  to  understand  quite  clearly  both  the  charac- 
leristics  of  the  two  approaches  and  their  interrelation  The  discussion  that  fol- 
lows is  intended  to  be  a step  toward  such  understanding 


1.  Kinds  of  "Models" 

The  word  "model"  is  one  of  those  which  means  quite  different  things  to  different  people  -- 
or  to  the  same  person  at  different  times  At  one  extreme  it  may  be  both  almost  completely 
normative  and  very  precise,  as  in  the  mathematical  expressions  which  describe  the  motions  of 
two  (or  threet  bodies  under  Newtonian  gravitation  Here  the  discovery  of  "unexplained" 
• meaning  "beyond  the  narrow  model"!  deviations  can  be  of  great  importance,  as  when  the 
advance  of  the  perihelion  of  Mercury  was  vital  in  the  assessment  of  Einstein's  theory  of  rela- 
tivity The  existence  of  such  precise  normative  models  almost  always  seems  to  depend  upon  a 
long  series  of  interactions  between  experiment  or  experience  on  the  one  hand  and  concepts  and 
theory  on  the  other 

At  another  extreme  lie  "models",  like  those  discussed  in  the  next  section,  that  are  highly 
adaptable,  because  they  involve  so  many  more  constants,  which  can  be  adjusted  to  give  a good 
fit  and  which,  because  of  the  diverse  kinds  of  behavior  to  which  these  constants  are  adapted, 
are  thought  of  almost  entirely  as  providing  empirical  descriptions.  Here  the  emphasis  is  on  the 
ability  to  describe  very  diverse  phenomena  in  a single  way,  and  the  discovery  of  more  or  less 
systematic  deviations  is  often  a call  to  increased  flexibility  - to  the  use  of  still  more  general 
"models"  to  absorb  these  deviations 

In  general,  a "model"  seems  to  tend  to  contain  two  elements,  the  collection  of  things  from 
which  one  is  to  be  selected  to  describe  a particular  instance  (the  "stock")  and.  often,  explicit  or 
implied  guidance  to  aid  in  interpreting  the  meaning  of  whichever  element  of  the  stock  is 
selected  (the  "guidance"),  although  -•  especially  in  the  two  extremes  just  discussed  --  the  latter 
clement  is  often  very  weak  --  or  even  nonexisiant 

In  the  context  of  multiple  regression.  Mosteller  and  I (1977)  have  introduced  the  word 
"stock"  (the  word  "posse"  has  also  been  used)  for  the  collection  of  possibilities  that  are  to  be  fit 
— from  which  one  is  to  be  selected  as  a useful  description.  In  the  extremely  flexible  case  just 
described  we  are  concerned  with  "broad  stocks"  (By  contrast,  a stock  involving  only  a few, 
hopefully  well-selected,  constants  would  be  a "narrow  stock"  ) 


’Prepared  in  pari  in  connection  »nh  research  at  Princeion  University  sponsored  by  the  Army  Research  Office 
IDurhaml 


April  28.  1978 


I 


D2 


The  second  extreme,  in  which  flexibility  is  emphasized  but  guidance  has  yet  to  be 
included,  is  reasonably  referred  to  as  involving  "broad  empirical  models".  (The  fact  that  gui- 
dance is  avoided  initially  need  not  mean  that  it  cannot  be  added,  as  Breckenridge's  paper  illus- 
trates. It  may  well  be  very  desirable  to  start  with  an  emphasis  on  adaptability  and  consequent 
good  fit  and  then  move  to  an  emphasis  on  guidance  in  the  interpretation  of  specific  fits. 

Still  another  extreme  is  given  by  relatively  narrow  (e  g . few-parameter)  models  where  it 
is  felt  that  the  way  in  which  the  constants  enter  into  the  algebraic  or  other  expressions  is  such 
that  we  can  make  useful  interpretations  of  changes  in  any  particular  one  A good  type  instance 
might  be  compartment  models  in  biology,  in  which  the  passage  of  a traceable  substance  through 
the  body  -•  perhaps  in  and  out  of  the  blood  stream  --  is  modelled  in  terms  of  very  simple 
differential  equations  --  differential  equations  in  which  only  the  rate  constants  are  to  be  fitted  to 
whatever  data  has  been  observed.  Here  a change  in  one  constant  may  be  rightly  given  a 
different  interpretation  than  a change  in  another  However,  there  need  be  no  feeling  that  close 
similarity  of  actual  occurrence  to  what  can  be  modelled  is  essential  (or  even  very  likely  ).  Devi- 
ations, if  not  too  large,  are  often  recognized  as  something  to  be  anticipated  and  overlooked  We 
might  call  such  models  "separating  models"  since  their  main  purpose  is  to  separate  information 
into  pieces  that,  at  least  hopefully,  bear  upon  separate  aspects  of  what  is  being  studied  Here 
guidance  is  an  important  part  of  the  model  As  a consequence,  for  example,  the  concern  of 
economists  for  identifiability  is  natural. 

A very  important  class  of  models  (unfortunately  frequent,  as  some  would  say;  sometimes 
hard  to  separate  from  the  previous  class)  are  those  well  described  as  "narrow  empirical  models" 
Here  we  have  found  a way  to  describe  most  of  the  detail  of  some  behavior  in  terms  of  a few 
constants  This  offers  two  great  advantages  first,  we  can  often  compare  situations  more 
effectively  and  more  intuitively  if  we  have  each  of  them  described  in  terms  of  only  a few 
numbers  Second,  we  can  usually  gain  precision  by  estimating  only  a few  constants  from  the 
data,  leaving  the  bulk  of  the  impact  of  irregularities,  deviations,  and  sampling  fluctuations  to 
the  residuals.  (.As  compared  with  broad  empirical  models,  these  models  will  involve  fewer  con- 
stants. perhaps  many  fewer.)  Here,  in  contrast  to  separating  models,  guidance  is  prominent  by 
its  absence,  and  fit  may  be  less  than,  or  even  far  from,  perfect  --  imperfection  of  fit  being 
accepted  in  return  for  the  two  advantages  just  cited 

The  last  class  of  "models"  we  shall  choose  to  mention  here  is  that  of  systems  of  "transfer- 
ring models"  by  whose  aid  we  hope  we  can  effectively  transfer  what  can  be  learned  from  obser- 
vations of  very  different  sorts  into  common  terms  Economists  hope  that  some  of  their  models 
are  of  this  kind,  as  when  they  compare  the  results  of  cross-sectional  studies  with  those  of  stu- 
dies of  time  series  Here  guidance  is  likely  to  be  much  more  important  than  stock. 
Bridgeman's  (1*1271  discussion  of  operational  constructs  in  physics,  in  which,  for  example, 
masses  measured  in  different  ways  represent  different  concepts,  illustrates  the  delicacy  of  such 
transfers. 

In  thinking  about  the  list  of  model  types  just  sketched 

• normative  models 

• broad  empirical  models 

• separating  models 

• narrow  empirical  models 

• transferring  models  (or  model  systems) 

we  need  to  be  careful  to  remember  that  these  have  been  isolated  as  characteristic  extremes,  and 
that  many  real  situations  are  likely  to  be  mixtures  of  at  least  two  of  them. 

Finally,  a mathematician  might  hope  for  very  frequent  successful  occurrence  of  "interpola- 
tory  models"  with  whose  aid  careful  measurement  in  a few  well-selected  situations  tells  us  about 
what  will  happen  in  many  other  (intermediate)  situations.  Such  models  are  at  least  very  close 
to  being  precise  normative  ones.  The  models  used  in  such  fields  as  the  strength  of  materials 
and  rates  of  chemical  reactions  often  come  close  to  doing  just  this.  Outside  of  classical  physical 


April  28.  1978 


D 3 


science,  however,  such  models  seem  infrequent 


2.  The  general  character  of  combinational  broad  stocks,  such  as  those  used  in  EHR 

It  is  important  to  recognize  that  the  expressions  fitted  in  empirical  higher  rank  analysis  (in 
EHR)  are  selected  from  broad  stocks  and  do  belong  to  one  or  another  class  of  very  flexible 
slocks.  These  classes  make  very  little,  if  any.  use  of  our  understandings  of  the  mechanism 
underlying  the  data.  They  strive  to  mobilize  their  intrinsic  flexibility  and  to  be  guided  b>  the 
data  in  the  way  that  they  dispose  this  flexibility  to  provide  relatively  close  description.  As  Breck- 
enridge  has  emphasized,  they  are  usually  well  adapted  to  relatively  automatic  generalization  -- 
something  that  can  be  more  difficult  for  narrower  models. 

The  classes  of  such  flexible  models  so  far  of  greatest  importance  are  defined  in  terms  of 
the  simplest  arithmetic  operations,  beginning  with  a single  ” + " sign  (or  perhaps  two  such). 

Additive  fits  with  two  crossed  categories  take  the  form 

y„  - a + b + r, 

and  are  often  made  of  the  greatest  use  by  writing 

y,,  — q I data  ) 

where  q is  a well-chosen  monotone  function  (the  choice  </(r)  - log  r is  but  one  frequent  exam- 
ple). This  sort  of  approach  not  only  underlies  the  widespread  ramifications  of  the  analysis  of 
variance  (perhaps  the  most  widely  used  of  our  nonelementary  statistical  procedures)  but  also 
plays  a key  role  in  axiomatized  fundamental  measurement  (Luce  and  Tukey.  1964) 

Multiplicative  fits  are  almost  a twin  sib  to  additive  fits,  as  the  formula 

) = ABC 


and  the  transfer  rule 

lower  case  letter  — log  of  capital  letter 

shows  for  multiplicative  fits  involving  only  posiine  A.  B and  C Their  importance  here  lies 
not  in  this  twin-relation  but  in  their  facility  for  generalization 

Not  only 

A + BC, 
but 

BC,  + D E 
and 


A + B C + /)  E 

offer  convenient  generalizations  of  simple  multiplicative  models,  conveniently  described  as 
higher-rank  models.  (We  find  using  this  general-sounding  term  fairly  freely  for  a somewhat 
special-appearing  class  reasonable,  in  part  because  the  twin  here 

alb  + i, ) ( it,  + c,) 

has  not.  at  least  as  yet,  often  proved  to  be  a stage  of  description  that  was  helpful  on  our  way  to 
understanding  ) 

We  can  go  on  easily  to  still  higher  complexities,  to  stocks  in  which  we  sum  still  more 
terms  of  the  form 

(function  of  row  ) x (function  of  column) 


April  28.  1978 


m 


3.  ( omments  on  EHR  analysis. 

A number  of  comments  about  the  use  of  EHR  and  the  corresponding  stocks  deserve 
attention  here  The  most  important  have  to  do  with  re-expression  and  re-presentation 

Re-expression  can  greatly  influence  the  satisfactoriness  with  which  a well-selected  example 
from  a stock  of  a given  kind  describes  the  data.  We  are  familiar  with  this  in  circumstances 
where  we  understand,  in  detail,  what  is  happening  W'e  tend  to  forget  that  it  is  almost  equally 
likely  to  be  so  when  we  face  less-understood  (perhaps  still-impenetrable)  situations  in  a very 
empirical  mood 

If  we  were  given,  for  a collection  of  cylinders,  volumes,  cross-sectional  areas,  and  lengths, 
we  would  shrink  from  anyone  who  proposed  to  fit  volumes  with 

(a  function  of  cross-section)  PLUS  (a  function  of  length) 
since  we  would  recognize  the  need  for  TIMES  instead  of  PLUS.  Even  in  this  case,  we  might 
not  stop  to  think  that,  if  we  worked  with  log  volume  instead  of  volume  - a very  simple  re- 
expression  --  we  could  use  the  additive  broad  stock  very  effectively. 

If  we  had  data  on  blood  pressures,  1 fear  we  would  be  much  less  likely  to  shun  the  PLUS 
analysis  of  raw  blood  pressures,  much  less  likely  to  pounce  on  the  advantages  of  a PLUS 
analysis  of  Ion  blood  pressure,  though  such  an  advantage  would  be  there  And  as  we  move 
toward  even  less  understood  data,  we  are  even  more  likely  to  "miss  the  boat"  when  re- 
expression  would  help  There  is  no  intrinsic  reason  for  this,  we  have  only  failed  to  learn  to 
lake  advantage  of  our  opportunities 

The  issue  of  re-presentation,  discussed  by  Breckenridge  above,  is  of  a very  different  kind 
VV  here  re-expression  sought  for  us  a way  to  find  more  useful  tits,  more  useful  by  doing  a belter 
job  of  fitting,  we  now  are  trying  to  do  a more  useful  job  of  looking  at  the  exact  same  fit.  As  a 
simple  example,  consider  a fit 

2-f  H + 2 C l) 

which  can  also  be  written 

(.4  + < ><«  +/),+  ( A - ( >(/#  -/)) 

I Notice  carefully  that  these  two  forms  are  algebraically  identical,  as  can  easily  be  seen  by  multi- 
plying out  the  second  form)  Here  there  is  no  question  of  changing  fit.  only  of  rewriting  it. 

If  we  are  to  compare  the  results  of  such  a fit  applied  to  two  or  more  sets  of  data,  we  badly 
need  to  seek  out  a distinguished  re  presentation  of  each  fit.  at  least  so  that  the  results  will  be 
conveniently  comparable  If  one  looks  like 

24  H + 2(  /) 

when  the  other  looks  like 

(4  + < +/))  + <■(  - < l<«  - /)  ) 

we  may  miss  an  instance  of  a striking  resemblance,  something  we  ought  take  only  the  least  pos- 
sible chance  ol  doing 

Another  way  in  which  re-presentation  can  be  important  arises  when  we  can  find  a re- 
presentation. say . 

E / + G // 

in  which  one  factor,  say  / . is  very  nearly  constant  This  offers  us  the  opportunity  to  try  a less 
general  stock,  say 

E * + G // 

with  / 'approximately  given  by  E times  the  (nearly)  common  value  of  A 

Empirical  fils  with  broad  stocks  need  not  --  and  often  should  not  --  be  thought  of  as  ends 
in  themselves  Often  they  play  important  roles  in  leading  us  to  simpler  fils,  simpler  fits  which 


April  28.  1 978 


I 


i 


# 

i 


8 

jr 


may  or  may  not  gam  a more  or  less  normative  character. 

4.  The  structure  of  the  C'oale-Trussell  models. 

Let  us  look  next  at  the  internal  structure  of  what  might  be  thought  of,  by  some  at  least, 
as  almost  the  antithesis  of  the  higher-rank  broad  stocks  we  have  been  discussing.  The  Coale- 
Trussell  model  schedules  are  traditionally  thought  of  as  involving  one  constant,  m,  and  one 
variable  function  of  age  a.  G'(ci),  in  the  form 

J(a)  - G(a)n(a)e""'a' 

where  «(u)  and  v (ol  are  fixed  functions  of  the  age.  a. 

Once  we  think  of  dealing  with  a single  population  at  several  dates  (or  in  several  cohorts! 
or  once  we  think  of  dealing  with  several  populations,  we  need  to  subscript  m and  G(a ).  We 
mat  as  well  also  subscript  a.  since  we  will  only  be  using  discrete  age-ranges.  This  gives  us 

HI  | ( (J  I 

/ (u  ) = G.(a,)nia,)c 
and.  once  we  take  the  (natural)  logarithm. 

log  / (a ) “ log  ii (a  > + log  G,(a,)  + m i (a,) 

which  is  of  the  form 

v — K + C,D,  + EL 

where  K and  L are  fixed 

This  is  now  obviously  a special  case  of 

ti  + C D + £ F 

an  often  useful,  but  special  case  of  the  rank-3  stock 

A H + CD  + L F 

Thus  there  is  no  necessary  antithesis  between  such  models  and  empirical  higher-rank 
analysis.  There  may  well  be  a difference  in  purposes  and  in  style.  If  we  thought  of  the  empiri- 
cal higher-rank  analysis  as  an  end  in  itself,  a vast  gap  might  indeed  open  up  between  the 
approaches.  But  if.  as  we  ought,  we  think  of  such  analyses  as  the  first  step,  in  which  the  regular 
behavior  of  the  data  is  to  be  encompassed  as  thoroughly  as  we  can  (going  to  still  higher  rank 
when  necessary),  so  that  we  are  ready  to  proceed  to  seek  out  as  great  a simplification  of  the 
EUR  fit  as  we  believe  the  data  and  our  purposes,  combined,  will  sustain,  there  will  be  no 
antithesis  - and  the  gap  may  be  very  small. 

It  need  not  happen  that,  as  the  Danes  are  reputed  to  put  it,  we  "fall  with  our  nose  in  the 
butter".  The  effective  fewer-constant  fits,  if  they  exist,  may  not  be  such  that  they  can  be  found 
in  such  a way.  it  may  not  be  possible  to  convert  them  into  higher-rank  form  by  re-expressing 
the  response.  When  this  does  happen,  we  will  have  to  work  with  the  facts  as  they  are.  but  we 
should  not.  I would  argue,  accept  that  it  has  happened  without  careful  inquiry. 

5.  Comparison  of  the  two  models. 

A few  words  about  the  detailed  differences  between  Breckenridge's  EHR  analysis  and  the 
Coale-Trussell  schedules  are  in  order.  The  first  major  difference  is  in  the  chosen  response, 
between 

fertility  at  an  age  (for  an  age-interval) 
and 


accumulated  fertility  up  to  an  age-cut 
(as  a fraction  of  total) 


April  28.  1978 


D6 


The  fact  that  one  takes  the  logarithm  of  the  former,  but  the  folded  square-root  of  the  latter  is 
also  important,  but  perhaps  not  as  important. 

Beyond  this,  the  question  of  how  complicated  - or  simple  --  a stock  one  uses  (how  broad 
or  how  narrow  ) is  mainly  a matter  of  detailed  purposes  (1  argue  strongly  that  the  practical  way 
to  begin  is  to  fit  the  broad  stock,  going  on  then  to  whatever  degree  of  reduction  is  appropriate 
to  the  combination  of  data  behavior  and  our  purposes  ) 

What  are  the  main  issues  of  choice  in  this  situation'1  1 believe  that  the  purposes  toward 
which  the  Coale-Trussell  schedules  are  directed  combine,  to  various  degrees,  those  typical  of 

• descriptive  models. 

• separating  models,  and 

• transferring  models. 

with  decreasing  emphasis  as  we  move  down  the  list  (Ansley  Coale  chose  to  emphasize  the  first 
two  of  these,  in  an  independent  assessment.) 

For  the  first  purpose,  description,  we  want  to  (a)  make  our  fit  to  the  diversity  of  the  real 
world  as  good  as  we  can.  subject  to  ( b ) holding  the  number  of  parameters  to  a minimum.  For 
this  purpose,  it  should  not  be  important  whether  we  work  with  fertility  in  age  ranges  or  with 
accumulated  fertility  Equally  it  should  not  matter  what  re-expression  proves  to  be  useful 

The  analysis  suggested  in  the  closing  paragraphs  of  Breckenridge's  paper,  in  which  empiri- 
cal higher-rank  analysis  would  be  applied  to  data  from  a wide  variety  of  countries  (and  time 
periods),  would  be  a natural  first  step  in  an  FUR  search  for  just  what  description  would  be  most 
useful  To  be  fully  effective,  such  an  analysis  ought  to  explore  the  advantages,  not  only  of 
age-range  vs  accumulation,  but  also  of  varied  re-expressions  of  each. 

The  absence  of  effective,  experience-tested  techniques  for  guiding  the  exploration  of  re- 
expression in  such  situations  is  to  be  regretted,  but  we  must  start  to  learn  somewhere. 

Once  we  understand  clearly  both  how  we  can  do  relatively  very  well  with  both  broad-stock 
and  narrow-stock  fits,  it  will  be  time  to  ask  how  well  the  results  serve  our  needs  as  separating 
and  transferring  models  Then  we  can  sensibly  consider  what  changes  in  the  structure  of  the 
empirically  best-fitting  models  it  is  wise  or  reasonable  to  make  in  order  to  do  better  in  separat- 
ing and  transferring 

We  ought,  in  Student's  words,  plan  to  “use  all  the  allowed  principles  of  witchcraft" 

6.  References. 

P VX  Bridgman  1 92'  The  Logic  of  Modern  Physics.  MacMillan.  New  York 
Harold  Jeffreys  1929  Random  and  systematic  arrangements  Biometrika  3\:  1-8 

R l)  Luce  and  J VX  Tukev  1964.  Simultaneous  and  conjoint  measurement  J.  Math.  Psvch. 
1-27 

1 Mnsteller  and  J VV  Tukey  1977,  Data  Analysis  and  Regression:  a second  course  in  statistics. 
Addison-Wesley  Publishing  Company  . Reading.  Massachusetts  Especially  page  302ff 

"Student"  1937  Comparison  between  balanced  and  random  arrangements  of  field  plots.  Biomc - 
mka  29:  363-379  Sec  page  365 


April  28.  1978 


